Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2218

Performance of start= and rows= parameters are exponentially slow with large data sets

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.4.1
    • Fix Version/s: None
    • Component/s: Build
    • Labels:
      None

      Description

      With large data sets, > 10M rows.

      Setting start=<large number> and rows=<large numbers> is slow, and gets slower the farther you get from start=0 with a complex query. Random also makes this slower.

      Would like to somehow make this performance faster for looping through large data sets. It would be nice if we could pass a pointer to the result set to loop, or support very large rows=<number>.

      Something like:
      rows=1000
      start=0
      spointer=string_my_query_1

      Then within interval (like 5 mins) I can reference this loop:
      Something like:
      rows=1000
      start=1000
      spointer=string_my_query_1

      What do you think? Since the data is too great the cache is not helping.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                billnbell Bill Bell
              • Votes:
                2 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: