Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7627

Improve TermsEnum automaton filtering APIs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 6.4
    • None
    • None
    • New

    Description

      To filter a TermsEnum by a CompiledAutomaton, we currently have a number of different possibilities:

      • Terms.intersect(CompiledAutomaton, BytesRef) - efficient, but only works on NORMAL type automata
      • CompiledAutomaton.getTerms(Terms) - efficient, works on all automaton types, but requires a Terms instead of a TermsEnum, so no use for eg SortedDocValues.termsEnum()
      • AutomatonTermsEnum - takes a TermsEnum, so it's more general than the Terms methods above, but agian only works on NORMAL automata

      It's easy to do the wrong thing here, and at the moment we only guard against incorrect usage via runtime checks (see eg LUCENE-7576, https://github.com/flaxsearch/marple/issues/24). We should try and clean this up.

      Attachments

        1. LUCENE-7627.patch
          5 kB
          Alan Woodward
        2. LUCENE-7627.patch
          5 kB
          Alan Woodward

        Activity

          People

            romseygeek Alan Woodward
            romseygeek Alan Woodward
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: