Description
Spinoff from LUCENE-6582.
I created a new SynonymGraphFilter (to replace the current buggy
SynonymFilter), that produces correct graphs (does no "graph
flattening" itself). I think this makes it simpler.
This means you must add the FlattenGraphFilter yourself, if you are
applying synonyms during indexing.
Index-time syn expansion is a necessarily "lossy" graph transformation
when multi-token (input or output) synonyms are applied, because the
index does not store posLength, so there will always be phrase
queries that should match but do not, and then phrase queries that
should not match but do.
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
goes into detail about this.
However, with this new SynonymGraphFilter, if instead you do synonym
expansion at query time (and don't do the flattening), and you use
TermAutomatonQuery (future: somehow integrated into a query parser),
or maybe just "enumerate all paths and make union of PhraseQuery", you
should get 100% correct matches (not sure about "proper" scoring
though...).
This new syn filter still cannot consume an arbitrary graph.
Attachments
Attachments
Issue Links
- is blocked by
-
LUCENE-6721 How to handle back-compat for new graph TokenFilters?
- Open