Affects Version/s: None
Fix Version/s: None
I think Query.rewrite(IndexReader) should be generalized a bit to enable users to customize the rewrite process outside of the Query classes (i.e. without having to create a custom Query just to implement rewrite). This could be as simple as having rewrite accept a QueryRewriter parameter that has a method that accepts a Query to be rewritten. Only this method would call rewrite on any given Query. And given very few spots actually use the IndexReader arg, we could even remove that as a parameter and add a getter to QueryRewriter (which is allowed to return null). Or create a subclass e.g. QueryRewriterWithIndexReader if some prefer casting; debatable.
Today, users have to hard-code Lucene class names with related logic for each one. This is obviously annoying/tedious, and brittle as Lucene adds to queries, and tends to be duplicative. Examples of why an app might want to rewrite a query:
- to replace position-sensitive queries that are not already SpanQuery's with their SpanQuery equivalent. This is useful in highlighting – Luwak's SpanRewriter does this.
- to simplify BooleanQuery's to a canonical form, or other canonicalization such as BoostQuery boost of 1. The point is to simplify or strengthen the accuracy of query examination logic for whatever further purpose (e.g. routing a query for an optimization).
- to replace one field for another
- to "fix" pure negative queries so that they work (by adding a MatchAllDocs query). I'm surprised we still live with this.
- to relax a query that doesn't match to a looser one that does (e.g. manipulate minimumNumberShouldMatch) without re-parsing the query. Granted re-parsing affords using different analysis or other strategies.
- to make it easier to use a Lucene Query class as a base class during query parsing/building. You could rewrite to strip out/replace only the AST nodes and leave the real Lucene Queries as-is.
LUCENE-3041 is addressed (generic Query visitor) a customizable rewrite would allow a generic query visitor using a QueryRewriter that doesn't actually rewrite anything. It's a little abusive as it's doing wasted work and no rewrite is actually occurring, but I think the overhead needn't be that much and such a use-case might even special-case BooleanQuery in particular to lower the overhead further. Basically for known many-child aggregator Query classes, customize to simply delegate.