Details
-
Wish
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
New
Description
LUCENE-6664 has a patch for a new synonym filter that correctly handles multi-token synonyms, unlike the known bugs we have today (see http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html for examples).
But since existing query parsers and indexer (and I'm sure many other external analysis consumers) ignore PosLenAtt, that patch also has a back-compat layer, SausageGraphFilter, to "squash" the graph back down so these components work as best they can...
Anyway, unless we can figure out how to make the back-compat even better than SausageGraphFilter, we can't really move forward with graph token filters.
Robert suggested entirely new attributes for graph token streams, but I don't see how that can work: it seems like we'd then need to have 2 copies of certain token filters, e.g. StopFilter and StopGraphFilter.
Maybe we could fix indexer and at least our query parsers to barf if they every see PosLenAtt != 1? Then you'd know you need to add the back compat layer to your analysis chain...
Attachments
Issue Links
- blocks
-
LUCENE-6664 Replace SynonymFilter with SynonymGraphFilter
- Resolved