Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8990

IndexOrDocValuesQuery can take a bad decision for range queries if field has many values per document

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.3
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Heuristics of IndexOrDocValuesQuery are somewhat inconsistent for range queries . The leadCost that is provided is based on number of documents, meanwhile the cost() of a range query is based on the number of points that potentially match the query. 

      Therefore it might happen that a BKD tree has millions of points but this points correspond to just a few documents. Therefore we can take the decision of executing the query using docValues and in fact we are almost scanning all the points.

      Maybe the cost() function for range queries need to take into account the average number of points per document in the tree and adjust the value accordingly.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ivera Ignacio Vera
                Reporter:
                ivera Ignacio Vera
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h