Solr
  1. Solr
  2. SOLR-1726

Deep Paging and Large Results Improvements

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: 4.0
    • Component/s: None
    • Labels:
      None

      Description

      There are possibly ways to improve collections of "deep paging" by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215.

      There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127.

      1. TopScoreDocCollector.java
        10 kB
        Manojkumar Rangasamy Kannadasan
      2. TopDocsCollector.java
        7 kB
        Manojkumar Rangasamy Kannadasan
      3. SolrIndexSearcher.java
        77 kB
        Manojkumar Rangasamy Kannadasan
      4. SOLR-1726.patch
        10 kB
        Manojkumar Rangasamy Kannadasan
      5. SOLR-1726.patch
        10 kB
        Grant Ingersoll
      6. ResponseBuilder.java
        11 kB
        Manojkumar Rangasamy Kannadasan
      7. QueryComponent.java
        42 kB
        Manojkumar Rangasamy Kannadasan
      8. QParser.java
        11 kB
        Manojkumar Rangasamy Kannadasan
      9. CommonParams.java
        6 kB
        Manojkumar Rangasamy Kannadasan

        Issue Links

          Activity

          Grant Ingersoll created issue -
          Grant Ingersoll made changes -
          Field Original Value New Value
          Link This issue depends on LUCENE-2127 [ LUCENE-2127 ]
          Grant Ingersoll made changes -
          Link This issue depends on LUCENE-2215 [ LUCENE-2215 ]
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hoss Man made changes -
          Fix Version/s Next [ 12315093 ]
          Fix Version/s 1.5 [ 12313566 ]
          Grant Ingersoll made changes -
          Link This issue is duplicated by SOLR-2218 [ SOLR-2218 ]
          Hoss Man made changes -
          Fix Version/s 3.2 [ 12316172 ]
          Fix Version/s Next [ 12315093 ]
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Robert Muir made changes -
          Fix Version/s 3.3 [ 12316471 ]
          Fix Version/s 3.2 [ 12316172 ]
          Robert Muir made changes -
          Fix Version/s 3.4 [ 12316683 ]
          Fix Version/s 4.0 [ 12314992 ]
          Fix Version/s 3.3 [ 12316471 ]
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Robert Muir made changes -
          Fix Version/s 3.5 [ 12317876 ]
          Fix Version/s 3.4 [ 12316683 ]
          Hide
          Grant Ingersoll added a comment -

          Now that IndexSearcher.searchAfter() has been added, we should be able to simply hook this in by allowing the user to pass the "score doc" from the previous page into Solr. I would suggest parameters of: &pageDoc= and &pageScore=, but am open to other suggestions.

          Show
          Grant Ingersoll added a comment - Now that IndexSearcher.searchAfter() has been added, we should be able to simply hook this in by allowing the user to pass the "score doc" from the previous page into Solr. I would suggest parameters of: &pageDoc= and &pageScore=, but am open to other suggestions.
          Hide
          Yonik Seeley added a comment -

          &pageDoc= and &pageScore=

          Having the user pass these in seems very error prone.
          They also aren't going to know when the searcher changes (and the internal docid is invalidated).
          Also, it's not just pageScore that would need to be passed, but a list of the sort values (which we don't even support returning yet).

          Show
          Yonik Seeley added a comment - &pageDoc= and &pageScore= Having the user pass these in seems very error prone. They also aren't going to know when the searcher changes (and the internal docid is invalidated). Also, it's not just pageScore that would need to be passed, but a list of the sort values (which we don't even support returning yet).
          Hide
          Robert Muir added a comment -

          also keep in mind that IS.searchAfter hasn't yet been implemented for the sorting collectors.

          Show
          Robert Muir added a comment - also keep in mind that IS.searchAfter hasn't yet been implemented for the sorting collectors.
          Hide
          Robert Muir added a comment -

          They also aren't going to know when the searcher changes (and the internal docid is invalidated).

          Seems like this is really unrelated to deep paging though, wouldn't this cause normal paging thru search results to be screwy?!

          Show
          Robert Muir added a comment - They also aren't going to know when the searcher changes (and the internal docid is invalidated). Seems like this is really unrelated to deep paging though, wouldn't this cause normal paging thru search results to be screwy?!
          Hide
          Yonik Seeley added a comment -

          Seems like this is really unrelated to deep paging though, wouldn't this cause normal paging thru search results to be screwy?!

          Well, if pageDoc is Solr's external uniqueKey, then you're right (it's only slightly worse than normal paging across diff searchers).

          Show
          Yonik Seeley added a comment - Seems like this is really unrelated to deep paging though, wouldn't this cause normal paging thru search results to be screwy?! Well, if pageDoc is Solr's external uniqueKey, then you're right (it's only slightly worse than normal paging across diff searchers).
          Hide
          Grant Ingersoll added a comment -

          Having the user pass these in seems very error prone.

          How else would we do it? You don't want Solr keeping state, IMO.

          They also aren't going to know when the searcher changes (and the internal docid is invalidated).

          I was thinking it would be the external Unique ID, not Lucene's internal id, which would mean there would have to be a lookup. And, yes, you are correct they wouldn't know when the searcher changes, but you have that issue already with paging, so it is no worse than the existing case.

          Also, it's not just pageScore that would need to be passed, but a list of the sort values (which we don't even support returning yet).

          right, we would have to add that support to Lucene first. For Solr, we would need to pass in either the score or the value.

          Show
          Grant Ingersoll added a comment - Having the user pass these in seems very error prone. How else would we do it? You don't want Solr keeping state, IMO. They also aren't going to know when the searcher changes (and the internal docid is invalidated). I was thinking it would be the external Unique ID, not Lucene's internal id, which would mean there would have to be a lookup. And, yes, you are correct they wouldn't know when the searcher changes, but you have that issue already with paging, so it is no worse than the existing case. Also, it's not just pageScore that would need to be passed, but a list of the sort values (which we don't even support returning yet). right, we would have to add that support to Lucene first. For Solr, we would need to pass in either the score or the value.
          Hide
          Robert Muir added a comment -

          right, we would have to add that support to Lucene first.

          I think this isnt bad from the high level: we just generalize searchAfter(ScoreDoc after, Query query, Filter filter, ..) to
          searchAfter(ScoreDoc after, Query query, Filter filter, Sort sort, ...)

          The problem is that there are 87 different specialized sorting collectors, and searchAfter by score seems to be the real
          lucene use case, for 'deep paging thru results not sorted by score' I would use a database instead!

          I'm not against us adding it, just not motivated to for those reasons.

          Show
          Robert Muir added a comment - right, we would have to add that support to Lucene first. I think this isnt bad from the high level: we just generalize searchAfter(ScoreDoc after, Query query, Filter filter, ..) to searchAfter(ScoreDoc after, Query query, Filter filter, Sort sort, ...) The problem is that there are 87 different specialized sorting collectors, and searchAfter by score seems to be the real lucene use case, for 'deep paging thru results not sorted by score' I would use a database instead! I'm not against us adding it, just not motivated to for those reasons.
          Hide
          Grant Ingersoll added a comment -

          Likely true. I do tend to think this is mostly a sort by score issue as well, but I can see it being asked.

          Show
          Grant Ingersoll added a comment - Likely true. I do tend to think this is mostly a sort by score issue as well, but I can see it being asked.
          Hide
          Grant Ingersoll added a comment -

          I would suggest we just do score for now, but name the parameter to be pageSort instead of pageScore. Alternatively, maybe we should name them &lastId and &lastSortVal.

          Show
          Grant Ingersoll added a comment - I would suggest we just do score for now, but name the parameter to be pageSort instead of pageScore. Alternatively, maybe we should name them &lastId and &lastSortVal.
          Hide
          Robert Muir added a comment -

          yeah from my perspective i would prefer for the API to be 'complete'.

          One idea would be to start with one or two implementations (maybe in/out of order) for the sorting case, and dont overspecialize it yet.

          • for page 1, the ScoreDoc (FieldDoc really) will be null, so we just return the normal impl anyway.
          • even if our searchAfter isnt huper-duper fast, the user can always make the tradeoff like with page-by-score. they can always just pass null until like page 10 or something if they compute that it only starts to 'help' then.
          Show
          Robert Muir added a comment - yeah from my perspective i would prefer for the API to be 'complete'. One idea would be to start with one or two implementations (maybe in/out of order) for the sorting case, and dont overspecialize it yet. for page 1, the ScoreDoc (FieldDoc really) will be null, so we just return the normal impl anyway. even if our searchAfter isnt huper-duper fast, the user can always make the tradeoff like with page-by-score. they can always just pass null until like page 10 or something if they compute that it only starts to 'help' then.
          Hide
          Robert Muir added a comment -

          I opened LUCENE-3514 with this idea.

          Show
          Robert Muir added a comment - I opened LUCENE-3514 with this idea.
          Hide
          David Smiley added a comment -

          I've been following these "large result handling" related issues with some interest. I think there are some types of applications, the ones that I see at work, where the client essentially wants to process the entire results from Solr, ideally in a streaming manner. Paging (that is, making multiple requests of the dataset to Solr) would ideally not happen because it's kind of a pain and there are session / stateless issues and efficiency ones. Ideally Solr would allow SolrJ to stream the results. Aggregate information like facets would need to be calculated and retrievable up front, but anything per-document like the document's stored fields that were asked for and highlighting would be streamed. What do you guys think of this?

          Show
          David Smiley added a comment - I've been following these "large result handling" related issues with some interest. I think there are some types of applications, the ones that I see at work, where the client essentially wants to process the entire results from Solr, ideally in a streaming manner. Paging (that is, making multiple requests of the dataset to Solr) would ideally not happen because it's kind of a pain and there are session / stateless issues and efficiency ones. Ideally Solr would allow SolrJ to stream the results. Aggregate information like facets would need to be calculated and retrievable up front, but anything per-document like the document's stored fields that were asked for and highlighting would be streamed. What do you guys think of this?
          Hide
          Ryan McKinley added a comment -

          bq Ideally Solr would allow SolrJ to stream the results.

          Check:
          SolrServer.html#queryAndStreamResponse

          I have used it with up to 1M docs without much issue...

          Show
          Ryan McKinley added a comment - bq Ideally Solr would allow SolrJ to stream the results. Check: SolrServer.html#queryAndStreamResponse I have used it with up to 1M docs without much issue...
          Hide
          David Smiley added a comment -

          Awesome Ryan; thanks! I suspected it might exist but I didn't find it after looking for it so I thought I was mistaken.

          Show
          David Smiley added a comment - Awesome Ryan; thanks! I suspected it might exist but I didn't find it after looking for it so I thought I was mistaken.
          Hide
          Manojkumar Rangasamy Kannadasan added a comment -

          hi,
          I am working to insert a new type of query for the issue 1726 by including the lastpageScore and lastDoc in the query as stated by Grant. Can anyone please let me know the place of code where i can insert a new mapping rule for this query to a new function in SolrIndexSearcher.
          Kindly reply.

          Show
          Manojkumar Rangasamy Kannadasan added a comment - hi, I am working to insert a new type of query for the issue 1726 by including the lastpageScore and lastDoc in the query as stated by Grant. Can anyone please let me know the place of code where i can insert a new mapping rule for this query to a new function in SolrIndexSearcher. Kindly reply.
          Hide
          Grant Ingersoll added a comment -

          Hi Manoj,

          This shouldn't require a new query since it should work with all queries, but instead new parameters that get passed in alongside the query (see earlier comments that lay out what the parameter names are.) You might start by looking at how something like the &rows parameter or the &start parameter are handled and passed through down to the SolrIndexSearcher.

          Show
          Grant Ingersoll added a comment - Hi Manoj, This shouldn't require a new query since it should work with all queries, but instead new parameters that get passed in alongside the query (see earlier comments that lay out what the parameter names are.) You might start by looking at how something like the &rows parameter or the &start parameter are handled and passed through down to the SolrIndexSearcher.
          Hide
          Manojkumar Rangasamy Kannadasan added a comment -

          Hi, i have attached an implementation for this issue assuming the same functionality as IS.searchAfter with no sort. Kindly review my fix and provide feedbacks. The two parameters used for paging are pageScore and pageDoc.

          Show
          Manojkumar Rangasamy Kannadasan added a comment - Hi, i have attached an implementation for this issue assuming the same functionality as IS.searchAfter with no sort. Kindly review my fix and provide feedbacks. The two parameters used for paging are pageScore and pageDoc.
          Manojkumar Rangasamy Kannadasan made changes -
          Attachment SOLR-1726.patch [ 12505121 ]
          Attachment QParser.java [ 12505122 ]
          Attachment SolrIndexSearcher.java [ 12505123 ]
          Attachment QueryComponent.java [ 12505124 ]
          Attachment ResponseBuilder.java [ 12505125 ]
          Attachment CommonParams.java [ 12505126 ]
          Attachment TopDocsCollector.java [ 12505127 ]
          Attachment TopScoreDocCollector.java [ 12505128 ]
          Simon Willnauer made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Fix Version/s 3.5 [ 12317876 ]
          Hide
          Grant Ingersoll added a comment -

          Hi Manoj,

          This looks OK as a start. Would be nice to have tests to go with it.

          Why the overriding of getTotalHits on the TopScoreDocCollector? I don't think returning collectedHits is the right thing to do there.

          Also, you should be able to avoid an extra Collector create call at:

                  topCollector = TopScoreDocCollector.create(len, true);
                  //Issue 1726 Start
                  if(cmd.getScoreDoc() != null)
                  {
                  	topCollector = TopScoreDocCollector.create(len, cmd.getScoreDoc(), true); //create the Collector with InOrderPagingCollector
                  }
          
          

          But that is easy enough to fix.

          Show
          Grant Ingersoll added a comment - Hi Manoj, This looks OK as a start. Would be nice to have tests to go with it. Why the overriding of getTotalHits on the TopScoreDocCollector? I don't think returning collectedHits is the right thing to do there. Also, you should be able to avoid an extra Collector create call at: topCollector = TopScoreDocCollector.create(len, true ); //Issue 1726 Start if (cmd.getScoreDoc() != null ) { topCollector = TopScoreDocCollector.create(len, cmd.getScoreDoc(), true ); //create the Collector with InOrderPagingCollector } But that is easy enough to fix.
          Hide
          Manojkumar Rangasamy Kannadasan added a comment -

          Hi Grant, thanks for your comments. Regarding the collectedHits, if there are 4 docs as results and if we want to return only bottom 2 by giving appropriate pageScore and pageDoc, the expected result is to return only 2 docs as results. But totalHits returns all the 4 docs. Thats the reason i used collectedHits.
          Kindly correct me if my understanding is wrong.

          Show
          Manojkumar Rangasamy Kannadasan added a comment - Hi Grant, thanks for your comments. Regarding the collectedHits, if there are 4 docs as results and if we want to return only bottom 2 by giving appropriate pageScore and pageDoc, the expected result is to return only 2 docs as results. But totalHits returns all the 4 docs. Thats the reason i used collectedHits. Kindly correct me if my understanding is wrong.
          Hide
          Grant Ingersoll added a comment -

          totalHits should return the count of all the hits regardless of the number that are actually being collected. In other words, totalHits could be a million, but we only return the top 10. collectedHits only returns the count of how many are being returned.

          Show
          Grant Ingersoll added a comment - totalHits should return the count of all the hits regardless of the number that are actually being collected. In other words, totalHits could be a million, but we only return the top 10. collectedHits only returns the count of how many are being returned.
          Hide
          Grant Ingersoll added a comment -

          Brings this patch up to trunk, adds tests, cleans up a few areas. I think it is ready to go.

          Show
          Grant Ingersoll added a comment - Brings this patch up to trunk, adds tests, cleans up a few areas. I think it is ready to go.
          Grant Ingersoll made changes -
          Attachment SOLR-1726.patch [ 12512422 ]
          Hide
          Grant Ingersoll added a comment -

          Committed, thanks Manoj.

          Show
          Grant Ingersoll added a comment - Committed, thanks Manoj.
          Grant Ingersoll made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.6 [ 12319065 ]
          Resolution Fixed [ 1 ]
          Hide
          Yonik Seeley added a comment -

          Re-opening... this doesn't implement what was discussed. It uses an internal lucene docid, which is pretty dangerous.

          Show
          Yonik Seeley added a comment - Re-opening... this doesn't implement what was discussed. It uses an internal lucene docid, which is pretty dangerous.
          Yonik Seeley made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Yonik Seeley added a comment -

          pageScore should also be renamed to pageSort (or pageVal or pageSortVal) to future-proof when we can page by more than just score)

          QParser also seems like an odd place for handling/parsing these parameters... but I guess it's not a big deal.

          Show
          Yonik Seeley added a comment - pageScore should also be renamed to pageSort (or pageVal or pageSortVal) to future-proof when we can page by more than just score) QParser also seems like an odd place for handling/parsing these parameters... but I guess it's not a big deal.
          Hide
          Grant Ingersoll added a comment -

          Why is it any more dangerous than the action itself? If the reader has changed, this whole act is unnatural to begin with.

          Show
          Grant Ingersoll added a comment - Why is it any more dangerous than the action itself? If the reader has changed, this whole act is unnatural to begin with.
          Hide
          Yonik Seeley added a comment -

          You seemed to agree that it should be the external ID previously.

          Why is it any more dangerous than the action itself? If the reader has changed, this whole act is unnatural to begin with.

          It's the degree of breakage. Small change to index should yield a small amount of potential breakage - and it can be catastrophic (in unpredictable ways) if using internal docids.

          Show
          Yonik Seeley added a comment - You seemed to agree that it should be the external ID previously. Why is it any more dangerous than the action itself? If the reader has changed, this whole act is unnatural to begin with. It's the degree of breakage. Small change to index should yield a small amount of potential breakage - and it can be catastrophic (in unpredictable ways) if using internal docids.
          Hide
          Yonik Seeley added a comment -

          Some other issues:

          • the optimization doesn't work if the docset is also requested (i.e. if facet=true) since it's only added in one place.
          • on a quick test, I'm getting a maxScore=NaN
            <result name="response" numFound="29" start="0" maxScore="NaN">
            

            Not sure if that's expected, but it's likely to mess up at least some clients

          • when using pageDoc, the results get incorrectly cached as a non-paged query (and hence other requests that use the same query will be incorrect)
          • when using pageDoc, any previous cached queries will be incorrectly used and hence incorrect results will be returned
          • it was pretty easy to cause a NPE (but I haven't had time to look into the causes yet):
            http://localhost:8983/solr/select?q=*:*&pageDoc=20&pageScore=1.0&fl=[docid],score
            java.lang.NullPointerException
            	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:566)
            	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
            
          • if you look at the test for this, the query only ever matches a single doc! Given that the test actually passes while trying to use paging actually means that paging isn't working (since the second page should obviously yield no results).

          I've disabled this for now since it's not ready for prime-time and since it messes with non-deep-paged results.

          Show
          Yonik Seeley added a comment - Some other issues: the optimization doesn't work if the docset is also requested (i.e. if facet=true) since it's only added in one place. on a quick test, I'm getting a maxScore=NaN <result name= "response" numFound= "29" start= "0" maxScore= "NaN" > Not sure if that's expected, but it's likely to mess up at least some clients when using pageDoc, the results get incorrectly cached as a non-paged query (and hence other requests that use the same query will be incorrect) when using pageDoc, any previous cached queries will be incorrectly used and hence incorrect results will be returned it was pretty easy to cause a NPE (but I haven't had time to look into the causes yet): http://localhost:8983/solr/select?q=*:*&pageDoc=20&pageScore=1.0&fl=[docid],score java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:566) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203) if you look at the test for this, the query only ever matches a single doc! Given that the test actually passes while trying to use paging actually means that paging isn't working (since the second page should obviously yield no results). I've disabled this for now since it's not ready for prime-time and since it messes with non-deep-paged results.
          Hide
          Robert Muir added a comment -

          on a quick test, I'm getting a maxScore=NaN

          From the lucene point of view,

          for the first page, because scoredoc = null, this is set (in fact ordinary collector is used)
          for subsequent pages, searchAfter never sets this in the TopDocs, it would just be telling you something you already know! (and cost extra cpu per collect)

          Show
          Robert Muir added a comment - on a quick test, I'm getting a maxScore=NaN From the lucene point of view, for the first page, because scoredoc = null, this is set (in fact ordinary collector is used) for subsequent pages, searchAfter never sets this in the TopDocs, it would just be telling you something you already know! (and cost extra cpu per collect)
          Hide
          Erik Hatcher added a comment -

          What's the status of this? We still have in trunk CHANGES.txt this:

          * SOLR-1726: Added deep paging support to search (sort by score only) which should use less memory when paging deeply into results
           by keeping the priority queue small. (Manojkumar Rangasamy Kannadasan, gsingers)
          

          but the code has been reverted from trunk as I understand it. Remove the CHANGES entry until this gets straightened out?

          Show
          Erik Hatcher added a comment - What's the status of this? We still have in trunk CHANGES.txt this: * SOLR-1726: Added deep paging support to search (sort by score only) which should use less memory when paging deeply into results by keeping the priority queue small. (Manojkumar Rangasamy Kannadasan, gsingers) but the code has been reverted from trunk as I understand it. Remove the CHANGES entry until this gets straightened out?
          Hide
          Mark Miller added a comment -

          Remove the CHANGES entry until this gets straightened out?

          +1 - looks like Mike has made this work with non score sort as well, for when we put it back in.

          Show
          Mark Miller added a comment - Remove the CHANGES entry until this gets straightened out? +1 - looks like Mike has made this work with non score sort as well, for when we put it back in.
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hoss Man made changes -
          Fix Version/s 4.0 [ 12322455 ]
          Fix Version/s 4.0-ALPHA [ 12314992 ]
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Robert Muir made changes -
          Fix Version/s 4.0 [ 12322551 ]
          Fix Version/s 4.0-BETA [ 12322455 ]
          Hide
          Hoss Man added a comment -

          There is no indication that anyone is actively working on this issue, so removing 4.0 from the fixVersion.

          Show
          Hoss Man added a comment - There is no indication that anyone is actively working on this issue, so removing 4.0 from the fixVersion.
          Hoss Man made changes -
          Fix Version/s 4.0 [ 12322551 ]
          Otis Gospodnetic made changes -
          Fix Version/s 4.2 [ 12323893 ]
          Robert Muir made changes -
          Fix Version/s 4.3 [ 12324128 ]
          Fix Version/s 4.2 [ 12323893 ]
          Jan Høydahl made changes -
          Link This issue is duplicated by SOLR-1524 [ SOLR-1524 ]
          Hide
          Jan Høydahl added a comment -
          Show
          Jan Høydahl added a comment - What's the status of this issue? Ref http://wiki.apache.org/solr/CommonQueryParameters#pageDoc_and_pageScore
          Hide
          Dmitry Kan added a comment -

          does the deep paging issue apply to facet paging?

          Show
          Dmitry Kan added a comment - does the deep paging issue apply to facet paging?
          Gavin made changes -
          Link This issue depends on LUCENE-2127 [ LUCENE-2127 ]
          Gavin made changes -
          Link This issue depends upon LUCENE-2127 [ LUCENE-2127 ]
          Gavin made changes -
          Link This issue depends on LUCENE-2215 [ LUCENE-2215 ]
          Gavin made changes -
          Link This issue depends upon LUCENE-2215 [ LUCENE-2215 ]
          Uwe Schindler made changes -
          Fix Version/s 4.4 [ 12324324 ]
          Fix Version/s 4.3 [ 12324128 ]
          Hide
          Otis Gospodnetic added a comment -

          How ElasticSearch handles this: http://www.elasticsearch.org/guide/reference/api/search/scroll/
          (and note how this can be used to reindex from old index to new index as mentioned at http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ )

          Show
          Otis Gospodnetic added a comment - How ElasticSearch handles this: http://www.elasticsearch.org/guide/reference/api/search/scroll/ (and note how this can be used to reindex from old index to new index as mentioned at http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ )
          Hide
          Dmitry Kan added a comment -

          "Scrolling is not intended for real time user requests, it is intended for cases like scrolling over large portions of data that exists within elasticsearch to reindex it for example."

          are there any other applications for this except re-indexing?

          Also, is it known, how internally the scrolling is implemented, i.e. is it efficient in transferring to the client of only what is needed?

          Show
          Dmitry Kan added a comment - "Scrolling is not intended for real time user requests, it is intended for cases like scrolling over large portions of data that exists within elasticsearch to reindex it for example." are there any other applications for this except re-indexing? Also, is it known, how internally the scrolling is implemented, i.e. is it efficient in transferring to the client of only what is needed?
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Steve Rowe made changes -
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Fix Version/s 4.4 [ 12324324 ]
          Adrien Grand made changes -
          Fix Version/s 4.6 [ 12325000 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Hide
          Scott Stults added a comment -

          Dmitry, I think "scrolling" would help in the case of Hadoop integration, such as pulling a few hundred thousand docs based off of a query to the local node so that you can do an aggregated calculation with Pig or M/R.

          Show
          Scott Stults added a comment - Dmitry, I think "scrolling" would help in the case of Hadoop integration, such as pulling a few hundred thousand docs based off of a query to the local node so that you can do an aggregated calculation with Pig or M/R.
          Hide
          Otis Gospodnetic added a comment -

          Isn't this related to SOLR-5244, Joel Bernstein?

          Show
          Otis Gospodnetic added a comment - Isn't this related to SOLR-5244 , Joel Bernstein ?
          Hide
          Dmitry Kan added a comment -

          Scott Stults Thanks for the use case. This leans towards offline as well, but certainly makes sense.
          Our current use case is realtime though and we attacking the problem of deep paging differently at the moment (on the querying client side).

          Show
          Dmitry Kan added a comment - Scott Stults Thanks for the use case. This leans towards offline as well, but certainly makes sense. Our current use case is realtime though and we attacking the problem of deep paging differently at the moment (on the querying client side).
          Hoss Man made changes -
          Link This issue is superceded by SOLR-5463 [ SOLR-5463 ]
          Hide
          Hoss Man added a comment -

          Since all of the existing code attached to this issue was committed prior to 4.0 – but then nearly immediately disabled by commenting out the key bits in QParser.getPaging() – i think attempting to continue building off this existing issue would just be confusing.

          I'm marking this issue as "Resolution: Incomplete" and I've opened a new issue (SOLR-5463) to track new development towards this goal

          Show
          Hoss Man added a comment - Since all of the existing code attached to this issue was committed prior to 4.0 – but then nearly immediately disabled by commenting out the key bits in QParser.getPaging() – i think attempting to continue building off this existing issue would just be confusing. I'm marking this issue as "Resolution: Incomplete" and I've opened a new issue ( SOLR-5463 ) to track new development towards this goal
          Hoss Man made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Fix Version/s 4.0 [ 12322551 ]
          Fix Version/s 4.6 [ 12325000 ]
          Resolution Incomplete [ 4 ]

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              12 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development