Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6717

TermAutomatonQuery should be two-phased

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      TermAutomatonQuery (still in sandbox) is a simple way to get accurate query-time multi-token synonyms using the new SynonymGraphFilter from LUCENE-6664. It already has a utility class to directly translate an incoming TokenStream into a corresponding query.

      However the query is likely quite slow because it always iterates positions for all terms in the automaton.

      I think one simple approach is to walk the automaton and find the subset of terms (if any) that appear in common to all paths, and then approximate with ConjunctionDISI like PhraseQuery does. Such a subset doesn't always exist for an automaton (i.e. it could be empty), so the logic would have to be conditional...

      And I think there are more complex approximations we could make, but using ConjunctionDISI seems like a simple start.

      Attachments

        1. LUCENE-6717.patch
          31 kB
          Michael McCandless
        2. LUCENE-6717.patch
          30 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: