Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8990

IndexOrDocValuesQuery can take a bad decision for range queries if field has many values per document

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 8.3
    • None
    • None
    • New

    Description

      Heuristics of IndexOrDocValuesQuery are somewhat inconsistent for range queries . The leadCost that is provided is based on number of documents, meanwhile the cost() of a range query is based on the number of points that potentially match the query. 

      Therefore it might happen that a BKD tree has millions of points but this points correspond to just a few documents. Therefore we can take the decision of executing the query using docValues and in fact we are almost scanning all the points.

      Maybe the cost() function for range queries need to take into account the average number of points per document in the tree and adjust the value accordingly.

       

      Attachments

        Issue Links

          Activity

            People

              ivera Ignacio Vera
              ivera Ignacio Vera
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h