Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-977 make skip leverage inner reduction node counts
  3. COUCHDB-1076

_all_docs performance degrades as doc_del_count increases

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.1.1, 1.2
    • None
    • None

    Description

      The time required to query _all_docs?limit=1 can be proportional to the doc_del_count of the database depending on where the deleted docs are in the sort order. As kocolosk explained on IRC:

      The tree that serves as the source for _all_docs contains a small record for each document, deleted or not ... if you have a large number of deleted docs and their IDs are interspersed with the non-deleted ones i can imagine that it would cause additional seeks when streaming the _all_docs response

      In my use case (https://github.com/natevw/RQMS) and in other cases (e.g. a rolling log or any long-lived database), the deleted docs may all be at the beginning of the _all_docs view, making query performance end up like using "?limit=N&skip=doc_del_count".

      To improve the performance in cases where large blocks of documents have been deleted, kocolosk notes:

      [10:30am] kocolosk: the inner nodes in the btree currently report the doc_count and doc_del_count
      [10:31am] kocolosk: we might be able to rewrite the function that walks the btree so that it checks if the doc_count underneath an inner node is zero
      [10:31am] kocolosk: and then it can skip that part of the tree entirely
      [10:31am] kocolosk: instead of descending all the way to the leaf nodes and skipping deleted documents one by one
      [10:31am] n[ate]vw: yeah, that'd have the benefit of being a code-only change
      [10:31am] kocolosk: right
      [10:32am] kocolosk: i think it should work rather nicely, though it probably requires some interesting spelunking into the depths of couch_btree
      [10:33am] kocolosk: davisp: whaddya think? would what i'm proposing be possible? (checking for doc_count=0 in the inner node reduction and then skipping ahead when serving _all_docs)
      ...
      [10:39am] davisp: n[ate]vw: Its definitely doable, its just a question of how to do it so that I don't bleed from the eyes when trying to read through the implementation
      ...
      [10:40am] rnewson: davisp: right. we definitely want to preserve the high readability of the current btree code. /s
      [10:40am] davisp: kocolosk: The only thing I can think of is passing a function to the view iteration that gets called to evaluate whether it should decend to a child node based on the key/reduction pair
      [10:41am] kocolosk: davisp: what you're proposing would also allow for (more) efficient implementation of skip

      Attachments

        Issue Links

          Activity

            People

              tilgovi Randall Leeds
              natevw Nathan Vander Wilt
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: