Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
New
Description
TermAutomatonQuery (still in sandbox) is a simple way to get accurate query-time multi-token synonyms using the new SynonymGraphFilter from LUCENE-6664. It already has a utility class to directly translate an incoming TokenStream into a corresponding query.
However the query is likely quite slow because it always iterates positions for all terms in the automaton.
I think one simple approach is to walk the automaton and find the subset of terms (if any) that appear in common to all paths, and then approximate with ConjunctionDISI like PhraseQuery does. Such a subset doesn't always exist for an automaton (i.e. it could be empty), so the logic would have to be conditional...
And I think there are more complex approximations we could make, but using ConjunctionDISI seems like a simple start.