Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10207

Make TermInSetQuery usable with IndexOrDocValuesQuery

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      IndexOrDocValuesQuery is very useful to pick the right execution mode for a query depending on other bits of the query tree.

      We would like to be able to use it to optimize execution of TermInSetQuery. However IndexOrDocValuesQuery only works well if the "index" query can give an estimation of the cost of the query without doing anything expensive (like looking up all terms of the TermInSetQuery in the terms dict). Maybe we could implement it for primary keys (terms.size() == sumDocFreq) by returning the number of terms of the query? Another idea is to multiply the number of terms by the average postings length, though this could be dangerous if the field has a zipfian distribution and some terms have a much higher doc frequency than the average.

      romseygeek and I were discussing this a few weeks ago, and more recently mikemccand and gsmiller again independently. So it looks like there is interest in this. Here is an email thread where this was recently discussed: https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.

      Attachments

        Activity

          People

            gsmiller Greg Miller
            jpountz Adrien Grand
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m