[LUCENE-8184] Enable flexible Query.rewrite - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

I think Query.rewrite(IndexReader) should be generalized a bit to enable users to customize the rewrite process outside of the Query classes (i.e. without having to create a custom Query just to implement rewrite). This could be as simple as having rewrite accept a QueryRewriter parameter that has a method that accepts a Query to be rewritten. Only this method would call rewrite on any given Query. And given very few spots actually use the IndexReader arg, we could even remove that as a parameter and add a getter to QueryRewriter (which is allowed to return null). Or create a subclass e.g. QueryRewriterWithIndexReader if some prefer casting; debatable.

Today, users have to hard-code Lucene class names with related logic for each one. This is obviously annoying/tedious, and brittle as Lucene adds to queries, and tends to be duplicative. Examples of why an app might want to rewrite a query:

to replace position-sensitive queries that are not already SpanQuery's with their SpanQuery equivalent. This is useful in highlighting – Luwak's SpanRewriter does this.
to simplify BooleanQuery's to a canonical form, or other canonicalization such as BoostQuery boost of 1. The point is to simplify or strengthen the accuracy of query examination logic for whatever further purpose (e.g. routing a query for an optimization).
to replace one field for another
to "fix" pure negative queries so that they work (by adding a MatchAllDocs query). I'm surprised we still live with this.
to relax a query that doesn't match to a looser one that does (e.g. manipulate minimumNumberShouldMatch) without re-parsing the query. Granted re-parsing affords using different analysis or other strategies.
to make it easier to use a Lucene Query class as a base class during query parsing/building. You could rewrite to strip out/replace only the AST nodes and leave the real Lucene Queries as-is.

Finally until ~~LUCENE-3041~~ is addressed (generic Query visitor) a customizable rewrite would allow a generic query visitor using a QueryRewriter that doesn't actually rewrite anything. It's a little abusive as it's doing wasted work and no rewrite is actually occurring, but I think the overhead needn't be that much and such a use-case might even special-case BooleanQuery in particular to lower the overhead further. Basically for known many-child aggregator Query classes, customize to simply delegate.

Attachments

Issue Links

is related to

LUCENE-3041 Support Query Visting / Walking

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: David Smiley

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Feb/18 06:50

Updated:: 28/Aug/22 15:26