Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8037

speed up term range queries, use filter cache for embedded ranges

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Enhance term range queries (i.e. not numeric) by

      • Implement DocSetProducer to directly construct filters (fq, etc) for term range queries
      • Allow range queries that are part of other queries to automatically use the filter cache if the number of terms is large enough.
      1. SOLR-8037.patch
        31 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Performance was tested on various fields in an index with 5M docs and some deleted documents.
          Timing measurements were done by the client and thus represent all phases of the request (reading, query parsing, execution, response writing).

          Rough performance results:

          Filter production for range queries covering 1-10 terms:

          unique terms in field speedup
          100 161%
          1000 77%
          10000 76%
          100000 79%
          1000000 51%

          Filter production for range queries covering all terms in field:

          unique terms in field speedup
          100 133%
          1000 116%
          10000 55%
          100000 18%
          1000000 6%

          Query performance containing range queries with 100% cache hits on filter cache - medium range queries (matching ~100 terms):

          unique terms in field speedup
          100 134%
          1000 24%
          10000 23%
          100000 2%
          1000000 4%

          Query performance containing range queries with 100% cache hits on filter cache - range queries covering all terms:

          unique terms in field speedup
          100 118%
          1000 90%
          10000 170%
          100000 438%
          1000000 908%
          Show
          yseeley@gmail.com Yonik Seeley added a comment - Performance was tested on various fields in an index with 5M docs and some deleted documents. Timing measurements were done by the client and thus represent all phases of the request (reading, query parsing, execution, response writing). Rough performance results: Filter production for range queries covering 1-10 terms: unique terms in field speedup 100 161% 1000 77% 10000 76% 100000 79% 1000000 51% Filter production for range queries covering all terms in field: unique terms in field speedup 100 133% 1000 116% 10000 55% 100000 18% 1000000 6% Query performance containing range queries with 100% cache hits on filter cache - medium range queries (matching ~100 terms): unique terms in field speedup 100 134% 1000 24% 10000 23% 100000 2% 1000000 4% Query performance containing range queries with 100% cache hits on filter cache - range queries covering all terms: unique terms in field speedup 100 118% 1000 90% 10000 170% 100000 438% 1000000 908%
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1702661 from Yonik Seeley in branch 'dev/trunk'
          [ https://svn.apache.org/r1702661 ]

          SOLR-8037: speed up term range queries, use filter cache for embedded ranges

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1702661 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1702661 ] SOLR-8037 : speed up term range queries, use filter cache for embedded ranges

            People

            • Assignee:
              Unassigned
              Reporter:
              yseeley@gmail.com Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development