Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-2735

Duplicate document _ids created under high edit load

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.0, 1.6.2
    • Component/s: Database Core
    • Security Level: public (Regular issues)
    • Labels:
      None

      Description

      Our database was created under CouchDB 1.2.1 and has been upgraded through 1.3.1 to 1.6.1. We have been running 1.6.1 since last September.

      We are finding that making a large number of edits to existing documents is causing duplicated document _ids to be created in the _all_docs view:

      1. curl -X GET http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\"
        {"total_rows":11670,"offset":10577,"rows":[
        {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}},
        {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"14-984492669d302229de0fff2e1c0e4696"}}
        ]}

      Compacting the database will resolve this.

      1. curl -X POST http://admin:password@127.0.0.1:5984/a2/_compact -H "Content-type: application/json" -d '{}'
      1. curl -X GET http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\"
        {"total_rows":11656,"offset":10564,"rows":[
        {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}}
        ]}

      The document is not in conflict at its starting revision and no databases have this database as a target which would cause the problematic document to be written to via replications. i.e. curl -X GET 'http://127.0.0.1:5984/a000prodmaster/vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd?conflicts=true&deleted_conflicts=true' just returns the document.

      Our edit process consists of a number of view functions and update handlers which are connected by python code to add extra document fields. We expect that many documents will come up in multiple views so document update conflicts are anticipated and handled in the python code. Some of the edits are return([modified_doc, response]) others are return([null, modified_doc]) which are collected and submitted as bulk saves (all_or_nothing=false).

      When a document _id is duplicated it appears that that views are calculated using the older revision while modifications are written to the newer revision.

      I am experiencing this regularly while testing an upgrade for a database containing ~12000 documents and which will trigger ~26000 edits. This upgrade test is on is a separate machine also running CouchDB 1.6.1 and Erlang 18 but the same was observed with 17.5.

      This issue appears similar to COUCHDB-968 but we have never run the versions that this affected.

        Activity

        Hide
        kocolosk Adam Kocoloski added a comment -

        Oof, COUCHDB-968 is one of those tickets I still remember by number – let's hope we don't go there again. Thanks for this exemplary bug report.

        When you observe a duplicated ID in _all_docs, do you also see the ID multiple times in a download of the _changes feed?

        Show
        kocolosk Adam Kocoloski added a comment - Oof, COUCHDB-968 is one of those tickets I still remember by number – let's hope we don't go there again. Thanks for this exemplary bug report. When you observe a duplicated ID in _all_docs, do you also see the ID multiple times in a download of the _changes feed?
        Hide
        jkdjira James Dingwall added a comment - - edited
        # curl -X GET http://127.0.0.1:5984/a2/_changes 2>/dev/null | grep vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd
        {"seq":209523,"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","changes":[{"rev":"14-984492669d302229de0fff2e1c0e4696"}]},
        {"seq":223726,"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","changes":[{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}]},
        
        Show
        jkdjira James Dingwall added a comment - - edited # curl -X GET http://127.0.0.1:5984/a2/_changes 2>/dev/null | grep vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd {"seq":209523,"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","changes":[{"rev":"14-984492669d302229de0fff2e1c0e4696"}]}, {"seq":223726,"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","changes":[{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}]},
        Hide
        jkdjira James Dingwall added a comment - - edited

        I have created a test case to demonstrate this problem: https://gist.github.com/JKDingwall/db3706d4a86c1743bbb8

        I was not able to create the duplicate _ids until the validate_doc_update function was added to the design document so perhaps that affects the timing or code path.

        (Usually once the count remaining is down to ~10000 for each editor thread the problem has manifested but you can watch in Futon to see once the document count is >12008 for the db.)

        Show
        jkdjira James Dingwall added a comment - - edited I have created a test case to demonstrate this problem: https://gist.github.com/JKDingwall/db3706d4a86c1743bbb8 I was not able to create the duplicate _ids until the validate_doc_update function was added to the design document so perhaps that affects the timing or code path. (Usually once the count remaining is down to ~10000 for each editor thread the problem has manifested but you can watch in Futon to see once the document count is >12008 for the db.)
        Hide
        jkdjira James Dingwall added a comment -

        I removed the validate_doc_update functions from our real system and tried my upgrade testing which was successful so the presence/absence of these definitely seems to be a factor.

        Show
        jkdjira James Dingwall added a comment - I removed the validate_doc_update functions from our real system and tried my upgrade testing which was successful so the presence/absence of these definitely seems to be a factor.
        Hide
        rnewson Robert Newson added a comment -

        Thanks, that's very helpful.

        Show
        rnewson Robert Newson added a comment - Thanks, that's very helpful.
        Hide
        kocolosk Adam Kocoloski added a comment -

        Yes, thanks - I was able to use that script to generate a database containing duplicates.

        Show
        kocolosk Adam Kocoloski added a comment - Yes, thanks - I was able to use that script to generate a database containing duplicates.
        Hide
        kocolosk Adam Kocoloski added a comment -

        Dropping into the shell I can see that the duplicates I generated each shared a common "1-..." revision:

        [#full_doc_info{id = <<"doc-508387df-ebb2-42da-af5d-371f5e48da18">>,
                        update_seq = 16869,deleted = false,
                        rev_tree = [{1,
                                     {<<68,176,131,68,251,221,250,201,130,38,108,87,143,93,240,
                                        123>>,
                                      {false,104275677,11360,6644},
                                      [{<<100,159,42,209,106,44,224,26,96,50,16,44,23,36,209,148>>,
                                        {false,175623139,16869,7652},
                                        []}]}}],
                        leafs_size = 7652},
         #full_doc_info{id = <<"doc-508387df-ebb2-42da-af5d-371f5e48da18">>,
                        update_seq = 16922,deleted = false,
                        rev_tree = [{1,
                                     {<<68,176,131,68,251,221,250,201,130,38,108,87,143,93,240,
                                        123>>,
                                      {false,104275677,11360,6644},
                                      [{<<254,225,162,250,53,143,142,193,7,168,167,202,245,63,
                                          171,158>>,
                                        {false,175493986,16852,6618},
                                        [{<<224,175,106,204,176,86,135,220,82,18,155,215,17,251,
                                            96,0>>,
                                          {false,175926477,16892,6627},
                                          [{<<49,247,214,60,110,121,122,50,36,70,134,216,236,142,
                                              104,240>>,
                                            {false,176061374,16903,6638},
                                            [{<<30,151,222,200,82,185,33,221,68,137,79,255,128,199,
                                                133,57>>,
                                              {false,176308382,16922,6646},
                                              []}]}]}]}]}}],
                        leafs_size = 6646}]
        

        and

        [#full_doc_info{id = <<"doc-38c48fcc-7b78-4b25-aeb1-5a7ca1415b78">>,
                        update_seq = 19238,deleted = false,
                        rev_tree = [{1,
                                     {<<227,195,74,105,207,226,38,206,118,136,51,57,248,95,86,
                                        10>>,
                                      {false,10684476,1258,8465},
                                      [{<<84,124,124,97,121,69,231,157,109,221,27,105,2,111,185,
                                          166>>,
                                        {false,205542469,19238,8395},
                                        []}]}}],
                        leafs_size = 8395},
         #full_doc_info{id = <<"doc-38c48fcc-7b78-4b25-aeb1-5a7ca1415b78">>,
                        update_seq = 19317,deleted = false,
                        rev_tree = [{1,
                                     {<<227,195,74,105,207,226,38,206,118,136,51,57,248,95,86,
                                        10>>,
                                      {false,10684476,1258,8465},
                                      [{<<39,22,93,28,77,211,104,15,179,97,81,175,184,114,246,27>>,
                                        {false,205421793,19222,8437},
                                        [{<<25,98,191,214,254,75,197,79,250,194,199,167,89,189,
                                            156,242>>,
                                          {false,205828209,19258,8443},
                                          [{<<184,171,176,153,99,62,9,245,202,182,251,239,165,213,
                                              135,54>>,
                                            {false,206634530,19317,8456},
                                            []}]}]}]}}],
                        leafs_size = 8456}]
        
        Show
        kocolosk Adam Kocoloski added a comment - Dropping into the shell I can see that the duplicates I generated each shared a common "1-..." revision: [#full_doc_info{id = << "doc-508387df-ebb2-42da-af5d-371f5e48da18" >>, update_seq = 16869,deleted = false , rev_tree = [{1, {<<68,176,131,68,251,221,250,201,130,38,108,87,143,93,240, 123>>, { false ,104275677,11360,6644}, [{<<100,159,42,209,106,44,224,26,96,50,16,44,23,36,209,148>>, { false ,175623139,16869,7652}, []}]}}], leafs_size = 7652}, #full_doc_info{id = << "doc-508387df-ebb2-42da-af5d-371f5e48da18" >>, update_seq = 16922,deleted = false , rev_tree = [{1, {<<68,176,131,68,251,221,250,201,130,38,108,87,143,93,240, 123>>, { false ,104275677,11360,6644}, [{<<254,225,162,250,53,143,142,193,7,168,167,202,245,63, 171,158>>, { false ,175493986,16852,6618}, [{<<224,175,106,204,176,86,135,220,82,18,155,215,17,251, 96,0>>, { false ,175926477,16892,6627}, [{<<49,247,214,60,110,121,122,50,36,70,134,216,236,142, 104,240>>, { false ,176061374,16903,6638}, [{<<30,151,222,200,82,185,33,221,68,137,79,255,128,199, 133,57>>, { false ,176308382,16922,6646}, []}]}]}]}]}}], leafs_size = 6646}] and [#full_doc_info{id = << "doc-38c48fcc-7b78-4b25-aeb1-5a7ca1415b78" >>, update_seq = 19238,deleted = false , rev_tree = [{1, {<<227,195,74,105,207,226,38,206,118,136,51,57,248,95,86, 10>>, { false ,10684476,1258,8465}, [{<<84,124,124,97,121,69,231,157,109,221,27,105,2,111,185, 166>>, { false ,205542469,19238,8395}, []}]}}], leafs_size = 8395}, #full_doc_info{id = << "doc-38c48fcc-7b78-4b25-aeb1-5a7ca1415b78" >>, update_seq = 19317,deleted = false , rev_tree = [{1, {<<227,195,74,105,207,226,38,206,118,136,51,57,248,95,86, 10>>, { false ,10684476,1258,8465}, [{<<39,22,93,28,77,211,104,15,179,97,81,175,184,114,246,27>>, { false ,205421793,19222,8437}, [{<<25,98,191,214,254,75,197,79,250,194,199,167,89,189, 156,242>>, { false ,205828209,19258,8443}, [{<<184,171,176,153,99,62,9,245,202,182,251,239,165,213, 135,54>>, { false ,206634530,19317,8456}, []}]}]}]}}], leafs_size = 8456}]
        Hide
        kocolosk Adam Kocoloski added a comment -

        So, it seems to me that if either `couch_db:group_alike_docs/1` or `couch_db_updater:merge_updates/3` failed to bucket updates to the same document ID correctly then the rest of the downstream update system could propagate duplicate entries into the by_id and by_seq indexes. This code is quite subtle and depends pretty critically on getting sorted lists in the right places, so we could be onto something here.

        I don't see a lot that directly implicates validation functions at the moment. Those functions are executed in the client process, not the db_updater process, so they wouldn't directly change the average occupancy that the db_updater would see in its merge_updates runs. There might be implicit relationships – the couchjs processes were certainly eating a lot of my CPU during the test run.

        Will keep poking at this and see if I can break the merging functions in creative ways.

        Show
        kocolosk Adam Kocoloski added a comment - So, it seems to me that if either `couch_db:group_alike_docs/1` or `couch_db_updater:merge_updates/3` failed to bucket updates to the same document ID correctly then the rest of the downstream update system could propagate duplicate entries into the by_id and by_seq indexes. This code is quite subtle and depends pretty critically on getting sorted lists in the right places, so we could be onto something here. I don't see a lot that directly implicates validation functions at the moment. Those functions are executed in the client process, not the db_updater process, so they wouldn't directly change the average occupancy that the db_updater would see in its merge_updates runs. There might be implicit relationships – the couchjs processes were certainly eating a lot of my CPU during the test run. Will keep poking at this and see if I can break the merging functions in creative ways.
        Hide
        kocolosk Adam Kocoloski added a comment -

        I can confirm that if group_alike_docs fails to bucket docs with the same ID then we can trivially produce multiple entries with the same ID. E.g. if I disable the ID check like so

        diff --git a/src/couchdb/couch_db.erl b/src/couchdb/couch_db.erl
        index 11ea0fd..2c5bac0 100644
        --- a/src/couchdb/couch_db.erl
        +++ b/src/couchdb/couch_db.erl
        @@ -451,7 +451,7 @@ group_alike_docs([Doc|Rest], []) ->
             group_alike_docs(Rest, [[Doc]]);
         group_alike_docs([{Doc,Ref}|Rest], [Bucket|RestBuckets]) ->
             [{#doc{id=BucketId},_Ref}|_] = Bucket,
        -    case Doc#doc.id == BucketId of
        +    case false of
             true ->
                 % add to existing bucket
                 group_alike_docs(Rest, [[{Doc,Ref}|Bucket]|RestBuckets]);
        

        then we're already broken

        $ curl 127.0.0.1:5984/jira2735/_bulk_docs -d '{"docs": [ {"_id":"bar", "a":1234}, {"_id":"bar", "a":5678} ]}' -H content-type:application/json
        [{"ok":true,"id":"bar","rev":"1-803a521ca22657d3b50cb27731964e8d"},{"ok":true,"id":"bar","rev":"1-d824233e1bf1016892c68255bae38e57"}]
        
        $ curl 127.0.0.1:5984/jira2735/_all_docs
        {"total_rows":2,"offset":0,"rows":[
        {"id":"bar","key":"bar","value":{"rev":"1-803a521ca22657d3b50cb27731964e8d"}},
        {"id":"bar","key":"bar","value":{"rev":"1-d824233e1bf1016892c68255bae38e57"}}
        ]}
        

        Will look to see if something similar occurs with the collect_updates code inside the db_updater, as the group_alike_docs code is only relevant for _bulk_docs loads and I have a feeling we would have seen this already if the breakage was there. This comment is really just proving out the theory that we don't have many deep checks for primary kay uniqueness in the codebase.

        Show
        kocolosk Adam Kocoloski added a comment - I can confirm that if group_alike_docs fails to bucket docs with the same ID then we can trivially produce multiple entries with the same ID. E.g. if I disable the ID check like so diff --git a/src/couchdb/couch_db.erl b/src/couchdb/couch_db.erl index 11ea0fd..2c5bac0 100644 --- a/src/couchdb/couch_db.erl +++ b/src/couchdb/couch_db.erl @@ -451,7 +451,7 @@ group_alike_docs([Doc|Rest], []) -> group_alike_docs(Rest, [[Doc]]); group_alike_docs([{Doc,Ref}|Rest], [Bucket|RestBuckets]) -> [{#doc{id=BucketId},_Ref}|_] = Bucket, - case Doc#doc.id == BucketId of + case false of true -> % add to existing bucket group_alike_docs(Rest, [[{Doc,Ref}|Bucket]|RestBuckets]); then we're already broken $ curl 127.0.0.1:5984/jira2735/_bulk_docs -d '{ "docs" : [ { "_id" : "bar" , "a" :1234}, { "_id" : "bar" , "a" :5678} ]}' -H content-type:application/json [{ "ok" : true , "id" : "bar" , "rev" : "1-803a521ca22657d3b50cb27731964e8d" },{ "ok" : true , "id" : "bar" , "rev" : "1-d824233e1bf1016892c68255bae38e57" }] $ curl 127.0.0.1:5984/jira2735/_all_docs { "total_rows" :2, "offset" :0, "rows" :[ { "id" : "bar" , "key" : "bar" , "value" :{ "rev" : "1-803a521ca22657d3b50cb27731964e8d" }}, { "id" : "bar" , "key" : "bar" , "value" :{ "rev" : "1-d824233e1bf1016892c68255bae38e57" }} ]} Will look to see if something similar occurs with the collect_updates code inside the db_updater, as the group_alike_docs code is only relevant for _bulk_docs loads and I have a feeling we would have seen this already if the breakage was there. This comment is really just proving out the theory that we don't have many deep checks for primary kay uniqueness in the codebase.
        Hide
        kocolosk Adam Kocoloski added a comment - - edited

        Last night I tried disabling the merging of updates from different clients and was unable to generate any duplicates during an overnight test run. After a bit more reasoning through the codebase this morning I believe I've tracked this down.

        The merge_updates function in couch_db_updater requires that input document groups are sorted by document ID. If it receives unsorted groups where the same ID is present in both groups it can fail to merge the groups correctly and the result will be duplicate IDs in the database.

        Now, the couch_db:group_alike_docs function generates sorted document groups, but the prep_and_validate_updates function reverses the order of the buckets. That function is only executed when one of three conditions is satisfied:

        1. at least one document being updated is a design document
        2. at least one document being updated contains attachments
        3. the database is configured with one or more validation functions

        This explains why the duplicates vanished when James removed the VDUs – the problematic function was bypassed altogether.

        So, how do we fix this? Of course we should add a reverse to the prep_and_validate functions so that they preserve the sort order. Sorting an already-sorted list is cheap, and given the consequences of working on unsorted input it would also be prudent for couch_db_updater to verify that the update requests are sorted before trying to merge them.

        All that aside, we desperately need better documentation and testing of this critical code path. Nowhere is it explicitly stated that the messages sent to couch_db_updater must sort the document groups beforehand. Many of these functions are quite testable, and in fact would lend themselves nicely to QuickCheck-style property-based testing should we decide to go that route.

        Incidentally, it looks like the 2.0 codebase does not suffer from this bug because it ended up incorporating an alternative fix for COUCHDB-911 that had the side effect of preserving the sorting correctly. The 2.0 updater process could still use the additional check for sorted input that I mentioned above.

        Show
        kocolosk Adam Kocoloski added a comment - - edited Last night I tried disabling the merging of updates from different clients and was unable to generate any duplicates during an overnight test run. After a bit more reasoning through the codebase this morning I believe I've tracked this down. The merge_updates function in couch_db_updater requires that input document groups are sorted by document ID. If it receives unsorted groups where the same ID is present in both groups it can fail to merge the groups correctly and the result will be duplicate IDs in the database. Now, the couch_db:group_alike_docs function generates sorted document groups, but the prep_and_validate_updates function reverses the order of the buckets. That function is only executed when one of three conditions is satisfied: at least one document being updated is a design document at least one document being updated contains attachments the database is configured with one or more validation functions This explains why the duplicates vanished when James removed the VDUs – the problematic function was bypassed altogether. So, how do we fix this? Of course we should add a reverse to the prep_and_validate functions so that they preserve the sort order. Sorting an already-sorted list is cheap, and given the consequences of working on unsorted input it would also be prudent for couch_db_updater to verify that the update requests are sorted before trying to merge them. All that aside, we desperately need better documentation and testing of this critical code path. Nowhere is it explicitly stated that the messages sent to couch_db_updater must sort the document groups beforehand. Many of these functions are quite testable, and in fact would lend themselves nicely to QuickCheck-style property-based testing should we decide to go that route. Incidentally, it looks like the 2.0 codebase does not suffer from this bug because it ended up incorporating an alternative fix for COUCHDB-911 that had the side effect of preserving the sorting correctly. The 2.0 updater process could still use the additional check for sorted input that I mentioned above.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fb696d71b45f4448cdd956262a01f7e39231e5d8 in couchdb's branch refs/heads/1.6.x from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=fb696d7 ]

        Preserve bucket ordering during validation

        Document buckets are sorted by docid, but the validation code was
        reversing the buckets. If multiple clients send concurrent updates for
        the same document the broken sorting can result in duplicate documents.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit fb696d71b45f4448cdd956262a01f7e39231e5d8 in couchdb's branch refs/heads/1.6.x from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=fb696d7 ] Preserve bucket ordering during validation Document buckets are sorted by docid, but the validation code was reversing the buckets. If multiple clients send concurrent updates for the same document the broken sorting can result in duplicate documents. COUCHDB-2735
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 5b1b3e155dd2909db75bed799f40f97c29410b19 in couchdb's branch refs/heads/2735-duplicate-docs from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=5b1b3e1 ]

        Preserve bucket ordering during validation

        Document buckets are sorted by docid, but the validation code was
        reversing the buckets. If multiple clients send concurrent updates for
        the same document the broken sorting can result in duplicate documents.

        The particular structure of the patch here is chosen to match the 2.x
        codebase.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit 5b1b3e155dd2909db75bed799f40f97c29410b19 in couchdb's branch refs/heads/2735-duplicate-docs from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=5b1b3e1 ] Preserve bucket ordering during validation Document buckets are sorted by docid, but the validation code was reversing the buckets. If multiple clients send concurrent updates for the same document the broken sorting can result in duplicate documents. The particular structure of the patch here is chosen to match the 2.x codebase. COUCHDB-2735
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 47d7b05fa63cb77ed7852a5d20f86720e6ac8de1 in couchdb's branch refs/heads/2735-duplicate-docs from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=47d7b05 ]

        Ensure doc groups are sorted before merging them

        We had been implicily assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit 47d7b05fa63cb77ed7852a5d20f86720e6ac8de1 in couchdb's branch refs/heads/2735-duplicate-docs from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=47d7b05 ] Ensure doc groups are sorted before merging them We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user kocolosk opened a pull request:

        https://github.com/apache/couchdb/pull/327

        Ensure PK uniqueness when merging updates from multiple clients

        We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The PR here fixes a case where we broke the sorting, and adds an additional sort inside the `couch_db_updater` server loop since the consequences of screwing this up are so severe.

        See COUCHDB-2735(https://issues.apache.org/jira/browse/COUCHDB-2735).

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/apache/couchdb 2735-duplicate-docs

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/couchdb/pull/327.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #327


        commit 61d33cb64124535571e61e6ba1b5f353fb297a40
        Author: Klaus Trainer <klaus_trainer@apache.org>
        Date: 2014-10-27T10:55:14Z

        Improve documentation of `cacert_file` ssl option

        The documentation was incorrect insofar that it only described its
        functionality for client verification, although the configuration is
        used for server verification as well.

        commit 46f7b4e7e1e34f427b333fd97ddb02e475848607
        Author: Robert Newson <rnewson@apache.org>
        Date: 2015-05-05T14:03:44Z

        s/afrikan/afrikaans/g

        commit 95cb436be30305efa091809813b64ef31af968c8
        Author: Dave Cottlehuber <dch@apache.org>
        Date: 2015-06-26T08:31:27Z

        build: support OTP-18.0

        commit 5b1b3e155dd2909db75bed799f40f97c29410b19
        Author: Adam Kocoloski <adam@cloudant.com>
        Date: 2015-07-15T21:06:18Z

        Preserve bucket ordering during validation

        Document buckets are sorted by docid, but the validation code was
        reversing the buckets. If multiple clients send concurrent updates for
        the same document the broken sorting can result in duplicate documents.

        The particular structure of the patch here is chosen to match the 2.x
        codebase.

        COUCHDB-2735

        commit 47d7b05fa63cb77ed7852a5d20f86720e6ac8de1
        Author: Adam Kocoloski <adam@cloudant.com>
        Date: 2015-07-17T23:20:36Z

        Ensure doc groups are sorted before merging them

        We had been implicily assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user kocolosk opened a pull request: https://github.com/apache/couchdb/pull/327 Ensure PK uniqueness when merging updates from multiple clients We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The PR here fixes a case where we broke the sorting, and adds an additional sort inside the `couch_db_updater` server loop since the consequences of screwing this up are so severe. See COUCHDB-2735 ( https://issues.apache.org/jira/browse/COUCHDB-2735 ). You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/couchdb 2735-duplicate-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb/pull/327.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #327 commit 61d33cb64124535571e61e6ba1b5f353fb297a40 Author: Klaus Trainer <klaus_trainer@apache.org> Date: 2014-10-27T10:55:14Z Improve documentation of `cacert_file` ssl option The documentation was incorrect insofar that it only described its functionality for client verification, although the configuration is used for server verification as well. commit 46f7b4e7e1e34f427b333fd97ddb02e475848607 Author: Robert Newson <rnewson@apache.org> Date: 2015-05-05T14:03:44Z s/afrikan/afrikaans/g commit 95cb436be30305efa091809813b64ef31af968c8 Author: Dave Cottlehuber <dch@apache.org> Date: 2015-06-26T08:31:27Z build: support OTP-18.0 commit 5b1b3e155dd2909db75bed799f40f97c29410b19 Author: Adam Kocoloski <adam@cloudant.com> Date: 2015-07-15T21:06:18Z Preserve bucket ordering during validation Document buckets are sorted by docid, but the validation code was reversing the buckets. If multiple clients send concurrent updates for the same document the broken sorting can result in duplicate documents. The particular structure of the patch here is chosen to match the 2.x codebase. COUCHDB-2735 commit 47d7b05fa63cb77ed7852a5d20f86720e6ac8de1 Author: Adam Kocoloski <adam@cloudant.com> Date: 2015-07-17T23:20:36Z Ensure doc groups are sorted before merging them We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 9f3e93da931db4069ca84a6b94c2b60c1b4c8440 in couchdb's branch refs/heads/1.6.x from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=9f3e93d ]

        Ensure doc groups are sorted before merging them

        We had been implicily assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit 9f3e93da931db4069ca84a6b94c2b60c1b4c8440 in couchdb's branch refs/heads/1.6.x from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=9f3e93d ] Ensure doc groups are sorted before merging them We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit cb4f1aa073a6add3b4a6215ab2d253bff8e8c961 in couchdb-couch's branch refs/heads/2735-duplicate-docs from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=cb4f1aa ]

        Ensure doc groups are sorted before merging them

        We had been implicily assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit cb4f1aa073a6add3b4a6215ab2d253bff8e8c961 in couchdb-couch's branch refs/heads/2735-duplicate-docs from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=cb4f1aa ] Ensure doc groups are sorted before merging them We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user kocolosk opened a pull request:

        https://github.com/apache/couchdb-couch/pull/76

        Ensure doc groups are sorted before merging them

        We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here.

        COUCHDB-2735

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/apache/couchdb-couch 2735-duplicate-docs

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/couchdb-couch/pull/76.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #76


        commit cb4f1aa073a6add3b4a6215ab2d253bff8e8c961
        Author: Adam Kocoloski <adam@cloudant.com>
        Date: 2015-07-18T11:49:00Z

        Ensure doc groups are sorted before merging them

        We had been implicily assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user kocolosk opened a pull request: https://github.com/apache/couchdb-couch/pull/76 Ensure doc groups are sorted before merging them We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/couchdb-couch 2735-duplicate-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch/pull/76.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #76 commit cb4f1aa073a6add3b4a6215ab2d253bff8e8c961 Author: Adam Kocoloski <adam@cloudant.com> Date: 2015-07-18T11:49:00Z Ensure doc groups are sorted before merging them We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        kocolosk Adam Kocoloski added a comment -

        I've submitted separate PRs against 1.x.x and master:

        https://github.com/apache/couchdb/pull/327
        https://github.com/apache/couchdb-couch/pull/76

        I accidentally committed the fix directly to 1.6.x; if there should prove to be any problem with the 1.x.x PR I'll be sure to address it on 1.6.x as well. I ran James' test case overnight and could not produce any primary key violations with this patch deployed.

        The additional sort of the document groups inside the db_updater is not free, but I clocked it at about 800 µs for groups of 1000 docs.

        Show
        kocolosk Adam Kocoloski added a comment - I've submitted separate PRs against 1.x.x and master: https://github.com/apache/couchdb/pull/327 https://github.com/apache/couchdb-couch/pull/76 I accidentally committed the fix directly to 1.6.x; if there should prove to be any problem with the 1.x.x PR I'll be sure to address it on 1.6.x as well. I ran James' test case overnight and could not produce any primary key violations with this patch deployed. The additional sort of the document groups inside the db_updater is not free, but I clocked it at about 800 µs for groups of 1000 docs.
        Hide
        kocolosk Adam Kocoloski added a comment -

        I don't see a FixFor 1.6.2 anywhere - do we need one?

        Show
        kocolosk Adam Kocoloski added a comment - I don't see a FixFor 1.6.2 anywhere - do we need one?
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 80503e255e4ffad8a3c055f4332383f34c8dab2a in couchdb-couch's branch refs/heads/2735-duplicate-docs from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=80503e2 ]

        Ensure doc groups are sorted before merging them

        We had been implicitly assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit 80503e255e4ffad8a3c055f4332383f34c8dab2a in couchdb-couch's branch refs/heads/2735-duplicate-docs from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=80503e2 ] Ensure doc groups are sorted before merging them We had been implicitly assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 50ada24b120c274b7d959ffe8e780055816b662e in couchdb's branch refs/heads/2735-duplicate-docs from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=50ada24 ]

        Ensure doc groups are sorted before merging them

        We had been implicitly assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit 50ada24b120c274b7d959ffe8e780055816b662e in couchdb's branch refs/heads/2735-duplicate-docs from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=50ada24 ] Ensure doc groups are sorted before merging them We had been implicitly assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 5b1b3e155dd2909db75bed799f40f97c29410b19 in couchdb's branch refs/heads/1.x.x from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=5b1b3e1 ]

        Preserve bucket ordering during validation

        Document buckets are sorted by docid, but the validation code was
        reversing the buckets. If multiple clients send concurrent updates for
        the same document the broken sorting can result in duplicate documents.

        The particular structure of the patch here is chosen to match the 2.x
        codebase.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit 5b1b3e155dd2909db75bed799f40f97c29410b19 in couchdb's branch refs/heads/1.x.x from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=5b1b3e1 ] Preserve bucket ordering during validation Document buckets are sorted by docid, but the validation code was reversing the buckets. If multiple clients send concurrent updates for the same document the broken sorting can result in duplicate documents. The particular structure of the patch here is chosen to match the 2.x codebase. COUCHDB-2735
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 50ada24b120c274b7d959ffe8e780055816b662e in couchdb's branch refs/heads/1.x.x from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=50ada24 ]

        Ensure doc groups are sorted before merging them

        We had been implicitly assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit 50ada24b120c274b7d959ffe8e780055816b662e in couchdb's branch refs/heads/1.x.x from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=50ada24 ] Ensure doc groups are sorted before merging them We had been implicitly assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/couchdb/pull/327

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/couchdb/pull/327
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 80503e255e4ffad8a3c055f4332383f34c8dab2a in couchdb-couch's branch refs/heads/master from Adam Kocoloski
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=80503e2 ]

        Ensure doc groups are sorted before merging them

        We had been implicitly assuming that clients send us sorted groups, but
        unsurprisingly that's not always the case. The additional sorting here
        should be redundant, but the consequences of merging unsorted groups are
        severe – we can end up with uniqueness violations on the primary key in
        the database – and so we add an additional sort here.

        COUCHDB-2735

        Show
        jira-bot ASF subversion and git services added a comment - Commit 80503e255e4ffad8a3c055f4332383f34c8dab2a in couchdb-couch's branch refs/heads/master from Adam Kocoloski [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=80503e2 ] Ensure doc groups are sorted before merging them We had been implicitly assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe – we can end up with uniqueness violations on the primary key in the database – and so we add an additional sort here. COUCHDB-2735
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/couchdb-couch/pull/76

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/couchdb-couch/pull/76
        Hide
        kocolosk Adam Kocoloski added a comment -

        All PRs merged. Setting FixFor versions to only those where this bug was latent, although we also added some additional validation in the 2.0 branch to prevent a similar bug from occurring.

        James, I'll leave this open if you're interested to verify the fix against any of the patched branches.

        Show
        kocolosk Adam Kocoloski added a comment - All PRs merged. Setting FixFor versions to only those where this bug was latent, although we also added some additional validation in the 2.0 branch to prevent a similar bug from occurring. James, I'll leave this open if you're interested to verify the fix against any of the patched branches.
        Hide
        jkdjira James Dingwall added a comment -

        I have built 1.6.1 with:

        build: support OTP-18.0
        Ensure doc groups are sorted before merging them
        Preserve bucket ordering during validation

        I am no longer seeing the original problem in any of my tests. Thank you for your efforts to resolve this and also to @rnewson for the original assistance on #couchdb.

        Show
        jkdjira James Dingwall added a comment - I have built 1.6.1 with: build: support OTP-18.0 Ensure doc groups are sorted before merging them Preserve bucket ordering during validation I am no longer seeing the original problem in any of my tests. Thank you for your efforts to resolve this and also to @rnewson for the original assistance on #couchdb.
        Hide
        kocolosk Adam Kocoloski added a comment -

        Excellent, thank you again James for this stellar bug report.

        Show
        kocolosk Adam Kocoloski added a comment - Excellent, thank you again James for this stellar bug report.
        Hide
        mera Merlin Rabens added a comment -

        Hi,

        We're using CouchDB 1.6.1 for a year now and we're recently facing duplicate document id's issues. We don't have any validate_doc_update functions in our design documents but anyhow: There were a plenty of documents with the same document id but different revision no.'s. We have two systems that access our CouchDB nodes heavily via lightcouch.

        Any ideas?

        At first, it seemed, that this Issue describes exactly our problem but it doesn't since we haven't any validate_doc_update functions.

        Thanks in advance.

        Kind regards,
        Merlin

        Show
        mera Merlin Rabens added a comment - Hi, We're using CouchDB 1.6.1 for a year now and we're recently facing duplicate document id's issues. We don't have any validate_doc_update functions in our design documents but anyhow: There were a plenty of documents with the same document id but different revision no.'s. We have two systems that access our CouchDB nodes heavily via lightcouch. Any ideas? At first, it seemed, that this Issue describes exactly our problem but it doesn't since we haven't any validate_doc_update functions. Thanks in advance. Kind regards, Merlin

          People

          • Assignee:
            kocolosk Adam Kocoloski
            Reporter:
            jkdjira James Dingwall
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development