Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-2784

Re-optimize skip query-string parameter in clusters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 2.0.0
    • None
    • Database Core
    • None

    Description

      In COUCHDB-977 we implemented a more efficient version of the skip function that relies on the document counts we maintain in the inner nodes of couch_btree. The 2.0 codebase did not initially take advantage of this enhancement, because when a user specifies `skip=X` we don't know a priori how many rows will be skipped from each shard.

      The current implementation tells each shard to not skip any rows and then has the coordinator discard the first N rows after doing the mergesort. It's O(N) complexity just like the bad old days before COUCHDB-977 and is actually substantially more expensive because of all the message traffic.

      Good news is we can do better. For a database with Q shards and a request specifying ?skip=X We know that either a) at least one of the shards will end skipping at least `X / Q` rows, or b) the entire response body will be empty. So, I propose the following:

      1. Set the per-shard skip value to `X div Q`
        • If a shard has fewer than `X div Q` rows remaining it should send its last row
        • If `X div Q` is zero we can short-circuit and just use the current algorithm.
      2. The coordinator sorts the first responses from each shard. It then sends the key of the row that sorts first (let's call it Foo) back to all the shards
      3. Each shard counts the number of rows in between the original startkey and Foo and sends that number, then starts streaming with Foo as the new startkey
      4. The coordinator deducts the computed per-shard skip values from the user-specified skip and then takes care of the remainder in the usual way we do it today (i.e. by consuming the rows as they come in).

      What do you think? Did I overlook anything here?

      Attachments

        Activity

          People

            kocolosk Adam Kocoloski
            kocolosk Adam Kocoloski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: