Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8184

Enable flexible Query.rewrite

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      I think Query.rewrite(IndexReader) should be generalized a bit to enable users to customize the rewrite process outside of the Query classes (i.e. without having to create a custom Query just to implement rewrite).  This could be as simple as having rewrite accept a QueryRewriter parameter that has a method that accepts a Query to be rewritten.  Only this method would call rewrite on any given Query.  And given very few spots actually use the IndexReader arg, we could even remove that as a parameter and add a getter to QueryRewriter (which is allowed to return null).  Or create a subclass e.g. QueryRewriterWithIndexReader if some prefer casting; debatable.
       
      Today, users have to hard-code Lucene class names with related logic for each one.  This is obviously annoying/tedious, and brittle as Lucene adds to queries, and tends to be duplicative.  Examples of why an app might want to rewrite a query:

      • to replace position-sensitive queries that are not already SpanQuery's with their SpanQuery equivalent.  This is useful in highlighting – Luwak's SpanRewriter does this.
      • to simplify BooleanQuery's to a canonical form, or other canonicalization such as BoostQuery boost of 1.  The point is to simplify or strengthen the accuracy of query examination logic for whatever further purpose (e.g. routing a query for an optimization).  
      • to replace one field for another
      • to "fix" pure negative queries so that they work (by adding a MatchAllDocs query).  I'm surprised we still live with this.
      • to relax a query that doesn't match to a looser one that does (e.g. manipulate minimumNumberShouldMatch) without re-parsing the query.  Granted re-parsing affords using different analysis or other strategies.
      • to make it easier to use a Lucene Query class as a base class during query parsing/building. You could rewrite to strip out/replace only the AST nodes and leave the real Lucene Queries as-is.

      Finally until LUCENE-3041 is addressed (generic Query visitor) a customizable rewrite would allow a generic query visitor using a QueryRewriter that doesn't actually rewrite anything.  It's a little abusive as it's doing wasted work and no rewrite is actually occurring, but I think the overhead needn't be that much and such a use-case might even special-case BooleanQuery in particular to lower the overhead further. Basically for known many-child aggregator Query classes, customize to simply delegate.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dsmiley David Smiley

              Dates

                Created:
                Updated:

                Slack

                  Issue deployment