Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      While testing, i found there are some queries (e.g. wildcard ?????????) that do quite a lot of backtracking.

      nextString doesn't handle this particularly well, when it walks the DFA, if it hits a dead-end and needs to backtrack, it increments the bytes, and starts over completely.

      alternatively it could save the path information in an int[], and backtrack() could return a position to restart from, instead of just a boolean.

      1. LUCENE-2672.patch
        3 kB
        Robert Muir
      2. LUCENE-2672.patch
        3 kB
        Robert Muir
      3. LUCENE-2672.patch
        5 kB
        Robert Muir

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        here is a prototype patch, needs more testing and benchmarking.

        Show
        rcmuir Robert Muir added a comment - here is a prototype patch, needs more testing and benchmarking.
        Hide
        rcmuir Robert Muir added a comment -

        here's another iteration, now that we have saved state, turns a run() into a single step()

        Show
        rcmuir Robert Muir added a comment - here's another iteration, now that we have saved state, turns a run() into a single step()
        Hide
        rcmuir Robert Muir added a comment -

        ok, heres a committable patch.

        i put a safety in here to address my own concerns. so the optimization doesnt apply to infinite dfas
        (but these typically dont backtrack anyway)

        i found a little perf problem with Standard's terms dict cache, we should avoid clone() on these deep hierarchies
        if theres a chance it will get called a lot. since the class in question is private static, i changed how clone() was impled.

        and i turned off terms dict cache for automaton, it doesnt seem to help in any query i test, and for some
        worst-case ones it slows things down (even with the cloning fix)... and queries like this "trash" the cache anyway.

        Show
        rcmuir Robert Muir added a comment - ok, heres a committable patch. i put a safety in here to address my own concerns. so the optimization doesnt apply to infinite dfas (but these typically dont backtrack anyway) i found a little perf problem with Standard's terms dict cache, we should avoid clone() on these deep hierarchies if theres a chance it will get called a lot. since the class in question is private static, i changed how clone() was impled. and i turned off terms dict cache for automaton, it doesnt seem to help in any query i test, and for some worst-case ones it slows things down (even with the cloning fix)... and queries like this "trash" the cache anyway.
        Hide
        rcmuir Robert Muir added a comment -

        Committed revision 1001781.

        I left out the terms dict cache clone() thing, its something we can revisit if we ever need to...
        not sure its a big deal especially if we can use it smarter (such as LUCENE-2674)

        Show
        rcmuir Robert Muir added a comment - Committed revision 1001781. I left out the terms dict cache clone() thing, its something we can revisit if we ever need to... not sure its a big deal especially if we can use it smarter (such as LUCENE-2674 )

          People

          • Assignee:
            rcmuir Robert Muir
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development