Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6717

TermAutomatonQuery should be two-phased

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      TermAutomatonQuery (still in sandbox) is a simple way to get accurate query-time multi-token synonyms using the new SynonymGraphFilter from LUCENE-6664. It already has a utility class to directly translate an incoming TokenStream into a corresponding query.

      However the query is likely quite slow because it always iterates positions for all terms in the automaton.

      I think one simple approach is to walk the automaton and find the subset of terms (if any) that appear in common to all paths, and then approximate with ConjunctionDISI like PhraseQuery does. Such a subset doesn't always exist for an automaton (i.e. it could be empty), so the logic would have to be conditional...

      And I think there are more complex approximations we could make, but using ConjunctionDISI seems like a simple start.

      Attachments

        1. LUCENE-6717.patch
          30 kB
          Michael McCandless
        2. LUCENE-6717.patch
          31 kB
          Michael McCandless

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless

            Dates

              Created:
              Updated:

              Issue deployment