Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7621

Per-document minShouldMatch

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 7.1, 8.0
    • None
    • None
    • New

    Description

      I have seen similar requirements a couple times but could not find any related issue so I am opening one now. The idea would be to allow passing a LongValuesSource rather than an integer as the minShouldMatch parameter of BooleanQuery so that the number of required clauses can depend on the document that is being matched. In terms of implementation, it looks like it would be straightforward as we would just have to update the value of minShouldMatch in MinShouldMatchSumScorer.setDocAndFreq and things would still be efficient, ie. we would still use advance on the costly clauses.

      This kind of feature would allow to run queries that must match eg. 80% of the terms that a document contains (by indexing the number of terms in a separate field). It would also make it possible for Luwak or ES' percolator to index boolean queries that have a value of minShouldMatch greater than 1 more efficiently.

      I do not have any plans to work on it soon but I am curious how much interest this feature would drive.

      Attachments

        1. LUCENE-7621.patch
          19 kB
          Adrien Grand

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jpountz Adrien Grand
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: