Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-2735

Duplicate document _ids created under high edit load

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.7.0, 1.6.2
    • Database Core
    • Security Level: public (Regular issues)
    • None

    Description

      Our database was created under CouchDB 1.2.1 and has been upgraded through 1.3.1 to 1.6.1. We have been running 1.6.1 since last September.

      We are finding that making a large number of edits to existing documents is causing duplicated document _ids to be created in the _all_docs view:

      1. curl -X GET http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\"
        {"total_rows":11670,"offset":10577,"rows":[
        {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}},
        {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"14-984492669d302229de0fff2e1c0e4696"}}
        ]}

      Compacting the database will resolve this.

      1. curl -X POST http://admin:password@127.0.0.1:5984/a2/_compact -H "Content-type: application/json" -d '{}'
      1. curl -X GET http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\"
        {"total_rows":11656,"offset":10564,"rows":[
        {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}}
        ]}

      The document is not in conflict at its starting revision and no databases have this database as a target which would cause the problematic document to be written to via replications. i.e. curl -X GET 'http://127.0.0.1:5984/a000prodmaster/vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd?conflicts=true&deleted_conflicts=true' just returns the document.

      Our edit process consists of a number of view functions and update handlers which are connected by python code to add extra document fields. We expect that many documents will come up in multiple views so document update conflicts are anticipated and handled in the python code. Some of the edits are return([modified_doc, response]) others are return([null, modified_doc]) which are collected and submitted as bulk saves (all_or_nothing=false).

      When a document _id is duplicated it appears that that views are calculated using the older revision while modifications are written to the newer revision.

      I am experiencing this regularly while testing an upgrade for a database containing ~12000 documents and which will trigger ~26000 edits. This upgrade test is on is a separate machine also running CouchDB 1.6.1 and Erlang 18 but the same was observed with 17.5.

      This issue appears similar to COUCHDB-968 but we have never run the versions that this affected.

      Attachments

        Activity

          People

            kocolosk Adam Kocoloski
            jkdjira James Dingwall
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: