Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2948

Make var gap terms index a partial prefix trie

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • 4.0-ALPHA
    • core/index
    • None
    • New

    Description

      Var gap stores (in an FST) the indexed terms (every 32nd term, by
      default), minus their non-distinguishing suffixes.

      However, often times the resulting FST is "close" to a prefix trie in
      some portion of the terms space.

      By allowing some nodes of the FST to store all outgoing edges,
      including ones that do not lead to an indexed term, and by recording
      that this node is then "authoritative" as to what terms exist in the
      terms dict from that prefix, we can get some important benefits:

      • It becomes possible to know that a certain term prefix cannot
        exist in the terms index, which means we can save a disk seek in
        some cases (like PK lookup, docFreq, etc.)
      • We can query for the next possible prefix in the index, allowing
        some MTQs (eg FuzzyQuery) to save disk seeks.

      Basically, the terms index is able to answer questions that previously
      required seeking/scanning in the terms dict file.

      Attachments

        1. Results.png
          29 kB
          Michael McCandless
        2. LUCENE-2948.patch
          100 kB
          Michael McCandless
        3. LUCENE-2948.patch
          101 kB
          Michael McCandless
        4. LUCENE-2948.patch
          118 kB
          Michael McCandless
        5. LUCENE-2948_automaton.patch
          3 kB
          Robert Muir

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: