CouchDB
  1. CouchDB
  2. COUCHDB-462

track conflict count in db_info (was built-in conflicts view)

    Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: HTTP Interface
    • Labels:
      None
    • Skill Level:
      Committers Level (Medium to Hard)

      Description

      This patch adds a built-in _conflicts view indexed by document ID that looks like

      GET /dbname/_conflicts

      {"rows":[

      {"id":"foo", "rev":"1-1aa8851c9bb2777e11ba56e0bf768649", "conflicts":["1-bdc15320c0850d4ee90ff43d1d298d5d"]}

      ]}

      GET /dbname/_conflicts?deleted=true

      {"rows":[

      {"id":"bar", "rev":"5-dd31186f5aa11ebd47eb664fb342f1b1", "conflicts":["5-a0efbb1990c961a078dc5308d03b7044"], "deleted_conflicts":["3-bdc15320c0850d4ee90ff43d1d298d5d","2-cce334eeeb02d04870e37dac6d33198a"]}

      ,

      {"id":"baz", "rev":"2-eec205a9d413992850a6e32678485900", "deleted":true, "deleted_conflicts":["2-10009b36e28478b213e04e71c1e08beb"]}

      ]}

      As the HTTPd and view layers are a bit outside my specialty I figured I should ask for a Review before Commit.

      1. whitespace.diff
        7 kB
        Bob Dionne
      2. conflicts_in_db_info2.diff
        27 kB
        Bob Dionne
      3. conflicts_in_db_info.diff
        15 kB
        Bob Dionne
      4. COUCHDB-462-adam-updated.patch
        15 kB
        Adam Kocoloski
      5. 462-jan-2.patch
        6 kB
        Jan Lehnardt
      6. COUCHDB-462-jan.patch
        6 kB
        Jan Lehnardt
      7. conflicts_view.diff
        5 kB
        Adam Kocoloski

        Issue Links

          Activity

          Hide
          Jan Lehnardt added a comment -

          Updated the patch:

          • fixed whitespace
          • added first round of tests
          • added 'key' member to result rows to be consistent with other views

          Found open issues:

          • doesn't accept multi-key-get via POST
          • doesn't send method not allowed for invalid methods
          • I suggest modelling it after _all_docs where the HTTP handlers are really short and use a helper function to do the heavy lifting

          Back to Adam

          Show
          Jan Lehnardt added a comment - Updated the patch: fixed whitespace added first round of tests added 'key' member to result rows to be consistent with other views Found open issues: doesn't accept multi-key-get via POST doesn't send method not allowed for invalid methods I suggest modelling it after _all_docs where the HTTP handlers are really short and use a helper function to do the heavy lifting Back to Adam
          Hide
          Adam Kocoloski added a comment -

          Jan, did you forget the view_builtin.js test in the patch?

          Show
          Adam Kocoloski added a comment - Jan, did you forget the view_builtin.js test in the patch?
          Hide
          Jan Lehnardt added a comment -

          add tests file

          Show
          Jan Lehnardt added a comment - add tests file
          Hide
          Adam Kocoloski added a comment -

          I don't think I have the bandwidth to add POST to keys today. Any objections to punting on that till later? I'll attach an updated patch (just some refactoring).

          Show
          Adam Kocoloski added a comment - I don't think I have the bandwidth to add POST to keys today. Any objections to punting on that till later? I'll attach an updated patch (just some refactoring).
          Hide
          Jan Lehnardt added a comment -

          Committed in r803663.

          Show
          Jan Lehnardt added a comment - Committed in r803663.
          Hide
          Damien Katz added a comment -

          I think we should reconsider this patch.

          For one thing, it's expensive at runtime, it requires doing a linear scan on the full doc index. If you have millions of docs and no conflicts, it will must scan through every doc meta record just to tell you that.

          Another problem is you don't get any filtering or formatting. Using couchdb view, a user can construct a view that shows conflicts by author, customer, area, etc and format the results for display. Using this facility, you get no formatting or collation options.

          I favor backing this change out.

          Show
          Damien Katz added a comment - I think we should reconsider this patch. For one thing, it's expensive at runtime, it requires doing a linear scan on the full doc index. If you have millions of docs and no conflicts, it will must scan through every doc meta record just to tell you that. Another problem is you don't get any filtering or formatting. Using couchdb view, a user can construct a view that shows conflicts by author, customer, area, etc and format the results for display. Using this facility, you get no formatting or collation options. I favor backing this change out.
          Hide
          Adam Kocoloski added a comment -

          Damien objected to this patch on IRC because it's not performant (it does a full scan over the by_id btree instead of maintaining a separate index) and because it's not flexible enough (e.g. no way to identify conflicts from a particular document type). His suggestion was just to let users define a custom view if they need aggregate information on conflicts.

          I understand and agree with the performance concerns. I worry that we aren't doing a good job of helping users develop conflict-aware applications, but we could probably alleviate that with some extra hand-holding in Futon.

          Show
          Adam Kocoloski added a comment - Damien objected to this patch on IRC because it's not performant (it does a full scan over the by_id btree instead of maintaining a separate index) and because it's not flexible enough (e.g. no way to identify conflicts from a particular document type). His suggestion was just to let users define a custom view if they need aggregate information on conflicts. I understand and agree with the performance concerns. I worry that we aren't doing a good job of helping users develop conflict-aware applications, but we could probably alleviate that with some extra hand-holding in Futon.
          Hide
          Adam Kocoloski added a comment -

          rolled back out in r803690

          Show
          Adam Kocoloski added a comment - rolled back out in r803690
          Hide
          Adam Kocoloski added a comment -

          good suggestion from Paul to just keep a counter of conflicts in the db_info

          Show
          Adam Kocoloski added a comment - good suggestion from Paul to just keep a counter of conflicts in the db_info
          Hide
          Adam Kocoloski added a comment -

          0.10.0 is out the door, adjusting FixFor on all remaining unresolved issues to 0.11 by default

          Show
          Adam Kocoloski added a comment - 0.10.0 is out the door, adjusting FixFor on all remaining unresolved issues to 0.11 by default
          Hide
          Bob Dionne added a comment -

          We have a new approach to surfacing conflict counts in the db_info object, using the recent refactoring work on couch_key_tree. Have a look, it's dependent on the last patch I submitted for COUCHDB-988 and it still needs an etap
          but I'll submit it as a patch if folks think it useful.

          It strikes me as a very useful feature.

          Cheers,

          Bob

          [1] https://github.com/bdionne/couchdb/commit/2405b3bd2ef86e689912c8f

          Show
          Bob Dionne added a comment - We have a new approach to surfacing conflict counts in the db_info object, using the recent refactoring work on couch_key_tree. Have a look, it's dependent on the last patch I submitted for COUCHDB-988 and it still needs an etap but I'll submit it as a patch if folks think it useful. It strikes me as a very useful feature. Cheers, Bob [1] https://github.com/bdionne/couchdb/commit/2405b3bd2ef86e689912c8f
          Hide
          Paul Joseph Davis added a comment -

          OMG WHITESPACE CHANGES!

          The one thing that I see being potentially confusing (if I'm reading this correctly) is that it's counting the number of conflicted revisions which is not the same as "documents in conflict". Perhaps that could be confusing to people?

          Show
          Paul Joseph Davis added a comment - OMG WHITESPACE CHANGES! The one thing that I see being potentially confusing (if I'm reading this correctly) is that it's counting the number of conflicted revisions which is not the same as "documents in conflict". Perhaps that could be confusing to people?
          Hide
          Adam Kocoloski added a comment -

          Ah, good point Paul. I think counting the number of documents in conflict is preferable.

          Show
          Adam Kocoloski added a comment - Ah, good point Paul. I think counting the number of documents in conflict is preferable.
          Hide
          Robert Newson added a comment -

          One concern. "Oh, I have 10 conflicts! Which documents are they?".

          They'll have to write a view to find out, and that view would also give them the number proposed here.

          Discuss.

          Show
          Robert Newson added a comment - One concern. "Oh, I have 10 conflicts! Which documents are they?". They'll have to write a view to find out, and that view would also give them the number proposed here. Discuss.
          Hide
          Paul Joseph Davis added a comment -

          @Bob2

          But for a "status of databases" page infographic dealy point of view, it'd probably be good to have this number in there.

          Show
          Paul Joseph Davis added a comment - @Bob2 But for a "status of databases" page infographic dealy point of view, it'd probably be good to have this number in there.
          Hide
          Bob Dionne added a comment -

          Paul,

          good suggestion. Here's a version that does something like that.

          Cheers,

          Bob

          [1] https://github.com/bdionne/couchdb/commit/9c5eb734bb1515

          Show
          Bob Dionne added a comment - Paul, good suggestion. Here's a version that does something like that. Cheers, Bob [1] https://github.com/bdionne/couchdb/commit/9c5eb734bb1515
          Hide
          Paul Joseph Davis added a comment -

          Unless I'm reading that wrong, it looks to be counting the number of nodes where the last element is in conflict. Also, the first clause would appear to be have an incorrect pattern match on the accumulator.

          Show
          Paul Joseph Davis added a comment - Unless I'm reading that wrong, it looks to be counting the number of nodes where the last element is in conflict. Also, the first clause would appear to be have an incorrect pattern match on the accumulator.
          Hide
          Bob Dionne added a comment -

          This is patch that adds a conflct_count to db_info as commented on above. This reflects Paul's suggestions and feedback. I also fixed up etaps that this breaks. I don't think this change would materially impact performance but I've not tested that.

          This clearly will be very useful in bigcouch for monitoring dbs and so forth. If folks want to include it I'll be happy to write a test case for it

          Show
          Bob Dionne added a comment - This is patch that adds a conflct_count to db_info as commented on above. This reflects Paul's suggestions and feedback. I also fixed up etaps that this breaks. I don't think this change would materially impact performance but I've not tested that. This clearly will be very useful in bigcouch for monitoring dbs and so forth. If folks want to include it I'll be happy to write a test case for it
          Hide
          Adam Kocoloski added a comment -

          Hi Bob, that diff did not apply for me. Can you rebase your branch against trunk and try again? A version without the whitespace differences would also be good. One way to do that is to use the -p option to `git add` and interactively select only the substantive changes.

          Show
          Adam Kocoloski added a comment - Hi Bob, that diff did not apply for me. Can you rebase your branch against trunk and try again? A version without the whitespace differences would also be good. One way to do that is to use the -p option to `git add` and interactively select only the substantive changes.
          Hide
          Bob Dionne added a comment -

          Hi Adam,

          this diff depends on changes made in 902 and 988 also. I rebased trunk and created another.
          I also broke out the white space fixes into a separate diff. These already existed in couchdb. I added them before as I assumed we wanted to fix them.

          Cheers,

          Bob

          Show
          Bob Dionne added a comment - Hi Adam, this diff depends on changes made in 902 and 988 also. I rebased trunk and created another. I also broke out the white space fixes into a separate diff. These already existed in couchdb. I added them before as I assumed we wanted to fix them. Cheers, Bob
          Hide
          Adam Kocoloski added a comment -

          Hi Bob, thanks for following up on this. Yes, we want to fix trailing whitespace issues, thanks for breaking those out into a separate patch. There's still a lot going on in the conflicts_in_db_info2.diff, not all of which is strictly needed for this issue. Ultimately we're going to need separate patches for 902/988 and this issue.

          So, the idea has evolved into reporting the number of documents with at least one conflict, rather than the total number of losing edit branches in the DB. I think that's the right move, but I also think it means we can write a more efficient reduce function by short-circuiting the evaluation as soon as we see a fork in the tree. I haven't tested it, but maybe the following will work. It should be substantially faster for documents with lots of long edit branches:

          -spec has_conflicts([path()]) -> boolean().
          has_conflicts([{_, RootOfBranch}]) ->
          has_forks(RootOfBranch);
          has_conflicts(_MultiplePaths) ->
          true.

          -spec has_forks(branch()) -> boolean().
          has_forks({_Key, _Value, []}) ->
          false;
          has_forks({_Key, _Value, [SubBranch]}) ->
          has_forks(SubBranch);
          has_forks({_Key, _Value, [_|_] = _ForkedTree}) ->
          true.

          Show
          Adam Kocoloski added a comment - Hi Bob, thanks for following up on this. Yes, we want to fix trailing whitespace issues, thanks for breaking those out into a separate patch. There's still a lot going on in the conflicts_in_db_info2.diff, not all of which is strictly needed for this issue. Ultimately we're going to need separate patches for 902/988 and this issue. So, the idea has evolved into reporting the number of documents with at least one conflict, rather than the total number of losing edit branches in the DB. I think that's the right move, but I also think it means we can write a more efficient reduce function by short-circuiting the evaluation as soon as we see a fork in the tree. I haven't tested it, but maybe the following will work. It should be substantially faster for documents with lots of long edit branches: -spec has_conflicts( [path()] ) -> boolean(). has_conflicts( [{_, RootOfBranch}] ) -> has_forks(RootOfBranch); has_conflicts(_MultiplePaths) -> true. -spec has_forks(branch()) -> boolean(). has_forks({_Key, _Value, []}) -> false; has_forks({_Key, _Value, [SubBranch] }) -> has_forks(SubBranch); has_forks({_Key, _Value, [_|_] = _ForkedTree}) -> true.
          Hide
          Adam Kocoloski added a comment -

          Ah, scratch that - we need to check if the leaf of the conflict edit branch has been deleted, of course. Oh well.

          Show
          Adam Kocoloski added a comment - Ah, scratch that - we need to check if the leaf of the conflict edit branch has been deleted, of course. Oh well.
          Hide
          Bob Dionne added a comment -

          Here's a new version that only walks the revision tree until conflicts are detected.

          [1] https://github.com/bdionne/couchdb/tree/couchdb-462

          Show
          Bob Dionne added a comment - Here's a new version that only walks the revision tree until conflicts are detected. [1] https://github.com/bdionne/couchdb/tree/couchdb-462
          Hide
          Adam Kocoloski added a comment -

          Hi Bob, thanks. I see a couple of problems with the patch:

          1) https://github.com/bdionne/couchdb/commit/31a5672#L2R337

          It looks like you're incrementing when deleted=true, rather than false.

          2) https://github.com/bdionne/couchdb/commit/31a5672#L2R331

          I think you're double-counting by passing N + NextN to count_non_deleted/2.

          More broadly, and in the context of COUCHDB-988, I'm wondering if many of these key tree operations could benefit from a refactoring that adds a generalized couch_key_tree:fold function. I believe Randall Leeds wrote such a thing at one point in time. I'm picturing a fold that supports short-circuiting, possibly using the

          {ok, _}

          and

          {stop, _}

          tagged tuples convention from couch_btree.

          ... Actually, digressing a bit further, I'm curious if anyone has ever considered the tradeoffs of different methods of short-circuiting. The tagged tuples are one approach, but it could also be implemented with erlang:throw/1. I'd expect that when short-circuiting is rare throwing an exception is the preferred approach because it creates less garbage along the way. I'm not sure what would be best when short-circuiting is routine (e.g. in the btree).

          Show
          Adam Kocoloski added a comment - Hi Bob, thanks. I see a couple of problems with the patch: 1) https://github.com/bdionne/couchdb/commit/31a5672#L2R337 It looks like you're incrementing when deleted=true, rather than false. 2) https://github.com/bdionne/couchdb/commit/31a5672#L2R331 I think you're double-counting by passing N + NextN to count_non_deleted/2. More broadly, and in the context of COUCHDB-988 , I'm wondering if many of these key tree operations could benefit from a refactoring that adds a generalized couch_key_tree:fold function. I believe Randall Leeds wrote such a thing at one point in time. I'm picturing a fold that supports short-circuiting, possibly using the {ok, _} and {stop, _} tagged tuples convention from couch_btree. ... Actually, digressing a bit further, I'm curious if anyone has ever considered the tradeoffs of different methods of short-circuiting. The tagged tuples are one approach, but it could also be implemented with erlang:throw/1. I'd expect that when short-circuiting is rare throwing an exception is the preferred approach because it creates less garbage along the way. I'm not sure what would be best when short-circuiting is routine (e.g. in the btree).
          Hide
          Bob Dionne added a comment -

          Thanks Adam,

          Here's a fix[1] for both issues you mention, and a few more etaps to test it. Looking over couch_key_tree I agree with you and Randall that many funs have similar shapes and it could likely use a fold. I'll look at that next.

          I'm not sure about using throw in Erlang. I know in Java where try/catch is enforced I've used it for precisely this case where you want out of a recursion based on some condition. Erlang has a different approach to failure. I'll look into it. What' s really needed in these cases is Scheme's call-with-current-continuation.

          Cheers,

          Bob

          [1] https://github.com/bdionne/couchdb/commit/593bcde1b4.patch

          Show
          Bob Dionne added a comment - Thanks Adam, Here's a fix [1] for both issues you mention, and a few more etaps to test it. Looking over couch_key_tree I agree with you and Randall that many funs have similar shapes and it could likely use a fold. I'll look at that next. I'm not sure about using throw in Erlang. I know in Java where try/catch is enforced I've used it for precisely this case where you want out of a recursion based on some condition. Erlang has a different approach to failure. I'll look into it. What' s really needed in these cases is Scheme's call-with-current-continuation. Cheers, Bob [1] https://github.com/bdionne/couchdb/commit/593bcde1b4.patch
          Hide
          Bob Dionne added a comment -

          Looking over the couch_key_tree functions we see a problem with map_leafs. It takes a function that's expected to have two arguments. However in the one place it's called, couch_db_updater:handle_call({purge_docs... it looks like it's being passed a function of one argument.

          The one likely test that might exercise this path, purge.js, does not. Is this dead code or a potential bug? This code has been there for a while

          Show
          Bob Dionne added a comment - Looking over the couch_key_tree functions we see a problem with map_leafs. It takes a function that's expected to have two arguments. However in the one place it's called, couch_db_updater:handle_call({purge_docs... it looks like it's being passed a function of one argument. The one likely test that might exercise this path, purge.js, does not. Is this dead code or a potential bug? This code has been there for a while
          Hide
          Adam Kocoloski added a comment -

          As mentioned in COUCHDB-1065, that bug is only triggered is a user tries to purge a strict subset of the leaf revisions of a document. I imagine that's a very unusual use case, which may help explain why we've never seen this reported.

          Show
          Adam Kocoloski added a comment - As mentioned in COUCHDB-1065 , that bug is only triggered is a user tries to purge a strict subset of the leaf revisions of a document. I imagine that's a very unusual use case, which may help explain why we've never seen this reported.
          Hide
          Bob Dionne added a comment - - edited

          I started reworking couch_key_tree[1] as suggested, to try and tease out some fold style functions. This should probably be done in COUCHDB-988 but I wanted to use has_conflicts as one of the tests. I'm able to refactor 4 or 5 so far but it required two types of fold. I'm calling them foldl and foldr for now (no relation to lists:foldl and lists:foldr).

          So far the results are mixed in that I haven't reduced the amount of code. I'm sticking to just reworking the functions in couch_key_tree internally and not changing how they interact with the rest of the code. It could turn out that some rework
          of how couch_key_tree is used might result in a simpler couch_key_tree.

          [1] https://github.com/bdionne/couchdb/commits/462

          Show
          Bob Dionne added a comment - - edited I started reworking couch_key_tree [1] as suggested, to try and tease out some fold style functions. This should probably be done in COUCHDB-988 but I wanted to use has_conflicts as one of the tests. I'm able to refactor 4 or 5 so far but it required two types of fold. I'm calling them foldl and foldr for now (no relation to lists:foldl and lists:foldr). So far the results are mixed in that I haven't reduced the amount of code. I'm sticking to just reworking the functions in couch_key_tree internally and not changing how they interact with the rest of the code. It could turn out that some rework of how couch_key_tree is used might result in a simpler couch_key_tree. [1] https://github.com/bdionne/couchdb/commits/462
          Hide
          Jan Lehnardt added a comment -

          Bump to 1.3.x

          Show
          Jan Lehnardt added a comment - Bump to 1.3.x
          Hide
          Joan Touzet added a comment -

          Any chance of reviving this patch, Bob D?

          Show
          Joan Touzet added a comment - Any chance of reviving this patch, Bob D?

            People

            • Assignee:
              Bob Dionne
              Reporter:
              Adam Kocoloski
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development