CouchDB
  1. CouchDB
  2. COUCHDB-1265

Replication can introduce duplicates into the seq_btree.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.1, 1.2
    • Component/s: Database Core
    • Labels:
      None
    • Skill Level:
      Guru Level (Everyone buy this person a beer at the next conference!)

      Description

      Full description, test, and patch to follow shortly.

      1. replication-frenzy.py
        0.6 kB
        Paul Joseph Davis
      2. COUCHDB-1265.patch
        7 kB
        Paul Joseph Davis

        Activity

        Hide
        Paul Joseph Davis added a comment -

        Fix introduction of duplicates into _changes feed

        When a document is updated the new update_seq is assigned as part of the
        rev_tree merging in couch_db_updater:merge_rev_trees/7 based on the
        condition of whether the new rev_tree is equal to the old tree. This
        equality is done as a simple Erlang term comparison. If the trees are
        not equal a new update_seq is assigned to the #full_doc_info{} record
        that is stored in fulldocinfo_by_id_btree.

        During replication it is possible that a document update merges into the
        rev_tree internally without creating a leaf. If the terminal node of the
        replicated path happens to land on a node with a value of ?REV_MISSING
        the new document information will be preferred and replace the
        ?REV_MISSING value.

        This preference ends up causing the rev_tree comparison to evaluate to
        false which ends up giving this document a new update_seq. Up until this
        point everything is fine. We write a bit of extra data (that will be
        cleared during compaction) because of a previous bug where we decided to
        be cautious and avoid losing data due to a broken rev_tree merging
        aglorithm. It is also important to note that this is the place were we
        calculate the update_seq to remove from the docinfo_by_seq_tree.

        After this point we get back to couch_db_udpater:update_docs_int/5 where
        we eventually call couch_db_updater:new_index_entries/3 which creates
        the new entries for the fulldocinfo_by_id_tree and docinfo_by_seq_btree.
        At this point we end up creating a #doc_info{} record based on the
        leaves in the rev_tree. As you recall, the update that caused the new
        update_seq was not a leaf, at this point we create a #doc_info{} record
        with an incorrect high_seq member pointing to the update_seq we are
        about to remove from the docinfo_by_seq_tree (remember we calculated the
        seq to remove before we consulted the leaves).

        The end result is that we remove the same update_seq we insert. This
        sets us up for the real bug. The next time we go to update this document
        the same procedure is applied. But what happens is at the point we
        calculate the seq to remove from docinfo_by_seq_tree, we calculate the
        wrong value. Thus when the update continues we remove an update_seq that
        doesn't exist in the tree and insert our new seq. But, the seq from the
        previous update is still in the tree. Thus, our docinfo_by_seq_tree now
        contains two entries pointing at the same document.

        At this point, we run into the observed behavior of this bug that ends
        up causing duplicate entries in views which then ends up throwing errors
        when the view is compaction. These errors are also particularly nasty
        because they bubble up the the couch_view gen_server which crashes and
        spiders out crashing every couch_view_group process. That part probably
        isn't important though.

        There's a simple test include with the patch to illustrate the behavior
        and maintain an assertion that it stays fixed.

        Fixes COUCHDB-1265

        Show
        Paul Joseph Davis added a comment - Fix introduction of duplicates into _changes feed When a document is updated the new update_seq is assigned as part of the rev_tree merging in couch_db_updater:merge_rev_trees/7 based on the condition of whether the new rev_tree is equal to the old tree. This equality is done as a simple Erlang term comparison. If the trees are not equal a new update_seq is assigned to the #full_doc_info{} record that is stored in fulldocinfo_by_id_btree. During replication it is possible that a document update merges into the rev_tree internally without creating a leaf. If the terminal node of the replicated path happens to land on a node with a value of ?REV_MISSING the new document information will be preferred and replace the ?REV_MISSING value. This preference ends up causing the rev_tree comparison to evaluate to false which ends up giving this document a new update_seq. Up until this point everything is fine. We write a bit of extra data (that will be cleared during compaction) because of a previous bug where we decided to be cautious and avoid losing data due to a broken rev_tree merging aglorithm. It is also important to note that this is the place were we calculate the update_seq to remove from the docinfo_by_seq_tree. After this point we get back to couch_db_udpater:update_docs_int/5 where we eventually call couch_db_updater:new_index_entries/3 which creates the new entries for the fulldocinfo_by_id_tree and docinfo_by_seq_btree. At this point we end up creating a #doc_info{} record based on the leaves in the rev_tree. As you recall, the update that caused the new update_seq was not a leaf, at this point we create a #doc_info{} record with an incorrect high_seq member pointing to the update_seq we are about to remove from the docinfo_by_seq_tree (remember we calculated the seq to remove before we consulted the leaves). The end result is that we remove the same update_seq we insert. This sets us up for the real bug. The next time we go to update this document the same procedure is applied. But what happens is at the point we calculate the seq to remove from docinfo_by_seq_tree, we calculate the wrong value. Thus when the update continues we remove an update_seq that doesn't exist in the tree and insert our new seq. But, the seq from the previous update is still in the tree. Thus, our docinfo_by_seq_tree now contains two entries pointing at the same document. At this point, we run into the observed behavior of this bug that ends up causing duplicate entries in views which then ends up throwing errors when the view is compaction. These errors are also particularly nasty because they bubble up the the couch_view gen_server which crashes and spiders out crashing every couch_view_group process. That part probably isn't important though. There's a simple test include with the patch to illustrate the behavior and maintain an assertion that it stays fixed. Fixes COUCHDB-1265
        Hide
        Paul Joseph Davis added a comment -

        For the curious, this is the original script I wrote that triggered the bug. It requires that you setup three couchdb servers on ports 5984, 5985, and 5986.

        Basic idea is that it just sets up continuous replication in both directions between all pairs of nodes. Then it just hammers one node updating the same document over and over again.

        Fairly shortly you can start seeing duplicates in the _changes feed.

        Show
        Paul Joseph Davis added a comment - For the curious, this is the original script I wrote that triggered the bug. It requires that you setup three couchdb servers on ports 5984, 5985, and 5986. Basic idea is that it just sets up continuous replication in both directions between all pairs of nodes. Then it just hammers one node updating the same document over and over again. Fairly shortly you can start seeing duplicates in the _changes feed.
        Hide
        Adam Kocoloski added a comment -

        Nice find, Paul. I'm not sure what our Erlang version we require on trunk these days, but if we're still supporting R13 you'll want to use the fully-qualified erlang:max/2. Otherwise the patch looks solid.

        Show
        Adam Kocoloski added a comment - Nice find, Paul. I'm not sure what our Erlang version we require on trunk these days, but if we're still supporting R13 you'll want to use the fully-qualified erlang:max/2. Otherwise the patch looks solid.
        Hide
        Paul Joseph Davis added a comment -

        Qualified the module for erlang:max/2

        Show
        Paul Joseph Davis added a comment - Qualified the module for erlang:max/2
        Hide
        Joan Touzet added a comment -

        This was my bug - thank you Mr. Davis! We will be testing today and providing feedback (most likely as a +1)

        Show
        Joan Touzet added a comment - This was my bug - thank you Mr. Davis! We will be testing today and providing feedback (most likely as a +1)
        Hide
        Filipe Manana added a comment -

        Paul, looks good.

        Just a side note, due to upgrade code in trunk for the branch sizes, instead of matching against {_Deleted, _DiskPos, OldTreeSeq, _Sie}, it would likely be better to call is_tuple(Value) and a element(3, Value) (or match against 3 and 4 element tuples).

        Show
        Filipe Manana added a comment - Paul, looks good. Just a side note, due to upgrade code in trunk for the branch sizes, instead of matching against {_Deleted, _DiskPos, OldTreeSeq, _Sie}, it would likely be better to call is_tuple(Value) and a element(3, Value) (or match against 3 and 4 element tuples).
        Hide
        Paul Joseph Davis added a comment -

        @Filipe

        Good catch on that. Probably about time we start replacing some of these tuples with records that serialize to proplists for upgrades. But that's another ball of honey to catch snakes with.

        Show
        Paul Joseph Davis added a comment - @Filipe Good catch on that. Probably about time we start replacing some of these tuples with records that serialize to proplists for upgrades. But that's another ball of honey to catch snakes with.
        Hide
        Joan Touzet added a comment -

        I'm +1 on this - now, I'm no longer seeing the Erlang traceback. It uncovered other issues (COUCHDB-1256 for one) but I'm chipping away at them too.

        THANK YOU!

        Show
        Joan Touzet added a comment - I'm +1 on this - now, I'm no longer seeing the Erlang traceback. It uncovered other issues ( COUCHDB-1256 for one) but I'm chipping away at them too. THANK YOU!
        Hide
        Paul Joseph Davis added a comment -

        This addresses the issue that Filipe spotted.

        If no one finds a bug in this over the next few hours I'll go ahead and commit and backport to 1.1.x

        Show
        Paul Joseph Davis added a comment - This addresses the issue that Filipe spotted. If no one finds a bug in this over the next few hours I'll go ahead and commit and backport to 1.1.x
        Hide
        James Howe added a comment -

        Presumably this bug could cause duplicate entries in views. Could it also cause views to contain deleted documents (that still had bodies which got them through the map)?

        Show
        James Howe added a comment - Presumably this bug could cause duplicate entries in views. Could it also cause views to contain deleted documents (that still had bodies which got them through the map)?
        Hide
        Paul Joseph Davis added a comment -

        Not directly, no. For a a doc to make it through to the map one of its leaves would have to be not deleted which would be visible when getting the doc directly.

        Show
        Paul Joseph Davis added a comment - Not directly, no. For a a doc to make it through to the map one of its leaves would have to be not deleted which would be visible when getting the doc directly.
        Hide
        Paul Joseph Davis added a comment -

        Fixed in

        1.2.x as of 1164350
        1.1.x as of 1164351

        Show
        Paul Joseph Davis added a comment - Fixed in 1.2.x as of 1164350 1.1.x as of 1164351
        Hide
        Paul Joseph Davis added a comment -

        Really fixed in:

        trunk r1176701
        1.2.x r1176703
        1.1.x r1176704

        As a follow up to COUCHDB-1265 I was missing the fact that after the
        insertion of a new update_seq into an internal node it is quite possible
        that a compaction runs before the doc is updated again. This is
        important because compaction removes information of the largest update
        seq from the tree itself.

        The fix is simple to include the update_seq from the #full_doc_info{}
        record when calculating #doc_info.high_seq. The way to think of this
        is that it's the maximum value from all known values for the update
        sequence which can be defined as all values known in the tree or in the
        full_doc_info record.

        Show
        Paul Joseph Davis added a comment - Really fixed in: trunk r1176701 1.2.x r1176703 1.1.x r1176704 As a follow up to COUCHDB-1265 I was missing the fact that after the insertion of a new update_seq into an internal node it is quite possible that a compaction runs before the doc is updated again. This is important because compaction removes information of the largest update seq from the tree itself. The fix is simple to include the update_seq from the #full_doc_info{} record when calculating #doc_info.high_seq. The way to think of this is that it's the maximum value from all known values for the update sequence which can be defined as all values known in the tree or in the full_doc_info record.
        Hide
        Paul Joseph Davis added a comment -

        Forgot to close this years ago.

        Show
        Paul Joseph Davis added a comment - Forgot to close this years ago.

          People

          • Assignee:
            Paul Joseph Davis
            Reporter:
            Paul Joseph Davis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development