Lucene - Core
  1. Lucene - Core
  2. LUCENE-2933

Two-stage state expansion for the FST: distance-from-root and child-count criteria.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None

      Description

      In the current implementation FST states are expanded into a binary search on labels (from a vector of transitions) when the child count of a state exceeds a given predefined threshold (NUM_ARCS_FIXED_ARRAY). This threshold affects automaton size and traversal speed (as it turns out when benchmarked). For some degenerate data sets, close-to-the-root nodes could have a small number of children (below the threshold) and yet be traversed on every single seek.

      A fix of this is to introduce two control thresholds:
      EXPAND state if (distance-from-root <= MIN_DISTANCE || children-count >= NUM_ARCS_FIXED_ARRAY)

      My plan is to create a data set that will prove this first and then to implement the workaround above.

      1. out copy.png
        214 kB
        Dawid Weiss
      2. LUCENE-2933.patch
        12 kB
        Dawid Weiss

        Activity

        Dawid Weiss created issue -
        Dawid Weiss made changes -
        Field Original Value New Value
        Assignee Dawid Weiss [ dweiss ]
        Dawid Weiss made changes -
        Attachment LUCENE-2933.patch [ 12471560 ]
        Dawid Weiss made changes -
        Attachment LUCENE-2933.patch [ 12471560 ]
        Dawid Weiss made changes -
        Attachment LUCENE-2933.patch [ 12471572 ]
        Dawid Weiss made changes -
        Attachment out copy.png [ 12471573 ]
        Dawid Weiss made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Lucene Fields [New]
        Resolution Fixed [ 1 ]
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Dawid Weiss
            Reporter:
            Dawid Weiss
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development