Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5463

Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.7, 6.0
    • None
    • None

    Description

      I'd like to revist a solution to the problem of "deep paging" in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous "page". This is similar to the "cursor" model I've seen in several other REST APIs that support "pagnation" over a large sets of results (notable the twitter API and it's "since_id" param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending.

      SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well.

      I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which:

      • supports arbitrary field sorts in addition to sorting by score
      • works in distributed mode
      Basic Usage
      • send a request with sort=X&start=0&rows=N&cursorMark=*
        • sort can be anything, but must include the uniqueKey field (as a tie breaker)
        • "N" can be any number you want per page
        • start must be "0"
        • "*" denotes you want to use a cursor starting at the beginning mark
      • parse the response body and extract the (String) nextCursorMark value
      • Replace the "*" value in your initial request params with the nextCursorMark value from the response in the subsequent request
      • repeat until the nextCursorMark value stops changing, or you have collected as many docs as you need

      Attachments

        1. SOLR-5463-randomized-faceting-test.patch
          8 kB
          Steven Rowe
        2. SOLR-5463.patch
          110 kB
          Chris M. Hostetter
        3. SOLR-5463.patch
          115 kB
          Chris M. Hostetter
        4. SOLR-5463.patch
          111 kB
          Chris M. Hostetter
        5. SOLR-5463.patch
          114 kB
          Steven Rowe
        6. SOLR-5463.patch
          118 kB
          Chris M. Hostetter
        7. SOLR-5463.patch
          126 kB
          Chris M. Hostetter
        8. SOLR-5463.patch
          132 kB
          Chris M. Hostetter
        9. SOLR-5463__straw_man.patch
          39 kB
          Chris M. Hostetter
        10. SOLR-5463__straw_man.patch
          58 kB
          Chris M. Hostetter
        11. SOLR-5463__straw_man.patch
          64 kB
          Chris M. Hostetter
        12. SOLR-5463__straw_man.patch
          70 kB
          Chris M. Hostetter
        13. SOLR-5463__straw_man.patch
          70 kB
          Chris M. Hostetter
        14. SOLR-5463__straw_man.patch
          94 kB
          Chris M. Hostetter
        15. SOLR-5463__straw_man.patch
          94 kB
          Chris M. Hostetter
        16. SOLR-5463__straw_man.patch
          94 kB
          Chris M. Hostetter
        17. SOLR-5463__straw_man.patch
          97 kB
          Chris M. Hostetter
        18. SOLR-5463__straw_man.patch
          99 kB
          Chris M. Hostetter
        19. SOLR-5463__straw_man__MissingStringLastComparatorSource.patch
          3 kB
          Steven Rowe

        Issue Links

          Activity

            People

              hossman Chris M. Hostetter
              hossman Chris M. Hostetter
              Votes:
              12 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: