Lucene - Core
  1. Lucene - Core
  2. LUCENE-5752

Explore light weight Automaton replacement

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This effort started with the patch on LUCENE-4556, to create a "light
      weight" replacement for the current object-heavy Automaton class
      (which creates separate State and Transition objects).

      I took that initial patch much further, and cutover most places in
      Lucene that use Automaton to LightAutomaton. Tests pass.

      The core idea of LightAutomaton is all states are ints, and you build
      up the automaton under the restriction that you add all outgoing
      transitions one state at a time. This worked well for most
      operations, but for some (e.g. UTF32ToUTF8!!) it was harder, so I also
      added a separate builder to add transitions in any order and then in
      the end they are sorted and added to the real automaton.

      If this is successful I think we should just replace the current
      Automaton with LightAutomaton; right now they both exist in my current
      patch...

      This is very much a work in progress, and I'm not sure the
      restrictions the API imposes are "reasonable" (some algos got uglier).
      But I think it's at least worth exploring/iterating... I'll make a branch and
      commit my current state.

      1. LUCENE-5752.patch
        510 kB
        Michael McCandless
      2. LUCENE-5752.patch
        429 kB
        Michael McCandless

        Activity

        Hide
        ASF subversion and git services added a comment -

        Commit 1601966 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1601966 ]

        LUCENE-5752: make branch

        Show
        ASF subversion and git services added a comment - Commit 1601966 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1601966 ] LUCENE-5752 : make branch
        Hide
        ASF subversion and git services added a comment -

        Commit 1601967 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1601967 ]

        LUCENE-5752: current state

        Show
        ASF subversion and git services added a comment - Commit 1601967 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1601967 ] LUCENE-5752 : current state
        Hide
        Dawid Weiss added a comment -

        This is how (int-based states) morfologik stores its automaton data. Nice!

        Show
        Dawid Weiss added a comment - This is how (int-based states) morfologik stores its automaton data. Nice!
        Hide
        ASF subversion and git services added a comment -

        Commit 1602031 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1602031 ]

        LUCENE-5752: cutover suggesters

        Show
        ASF subversion and git services added a comment - Commit 1602031 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1602031 ] LUCENE-5752 : cutover suggesters
        Hide
        ASF subversion and git services added a comment -

        Commit 1602049 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1602049 ]

        LUCENE-5752: more cutover

        Show
        ASF subversion and git services added a comment - Commit 1602049 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1602049 ] LUCENE-5752 : more cutover
        Hide
        Robert Muir added a comment -

        + // nocommit can we use varargs? rob was unhappy before?
        static public Automaton concatenate(List<Automaton> l) {

        Well, an alternative is to have concatenate(String, Automaton) just to ensure the optimization is preserved for the common case,
        and to be less trappy. But I think the optimization is important and currently relied upon, because we concat tiny strings on the LHS with large ones on the RHS and depend on it not being costly.

            // adding epsilon transitions with the NFA concatenation algorithm
            // in this case always produces a resulting DFA, preventing expensive
            // redundant determinize() calls for this common case.
            boolean deterministic = a1.isSingleton() && a2.isDeterministic();
        
        Show
        Robert Muir added a comment - + // nocommit can we use varargs? rob was unhappy before? static public Automaton concatenate(List<Automaton> l) { Well, an alternative is to have concatenate(String, Automaton) just to ensure the optimization is preserved for the common case, and to be less trappy. But I think the optimization is important and currently relied upon, because we concat tiny strings on the LHS with large ones on the RHS and depend on it not being costly. // adding epsilon transitions with the NFA concatenation algorithm // in this case always produces a resulting DFA, preventing expensive // redundant determinize() calls for this common case . boolean deterministic = a1.isSingleton() && a2.isDeterministic();
        Hide
        ASF subversion and git services added a comment -

        Commit 1602228 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1602228 ]

        LUCENE-5752: finish cutover

        Show
        ASF subversion and git services added a comment - Commit 1602228 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1602228 ] LUCENE-5752 : finish cutover
        Hide
        ASF subversion and git services added a comment -

        Commit 1602279 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1602279 ]

        LUCENE-5752: move det to the testcase; clarify javadocs that getCommonPrefix requires det automaton

        Show
        ASF subversion and git services added a comment - Commit 1602279 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1602279 ] LUCENE-5752 : move det to the testcase; clarify javadocs that getCommonPrefix requires det automaton
        Hide
        ASF subversion and git services added a comment -

        Commit 1602472 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1602472 ]

        LUCENE-5752: move Transition back out; improve tests

        Show
        ASF subversion and git services added a comment - Commit 1602472 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1602472 ] LUCENE-5752 : move Transition back out; improve tests
        Hide
        Michael McCandless added a comment -

        LightAutomaton is very different from Automaton:

        • It's more like String and A is more like StringBuilder: once LA is
          built you can't link it up to another LA just by changing a few
          transitions
        • It's not mutable after it's created; operations like determinize,
          minimize, totalize, etc. don't happen "in place" anymore, and
          instead return a new LA. I'd really love to forbid calling
          e.g. determinize and not "using" its result since I could easily
          have caused bugs with this. This also means we also don't have
          ops like "cloneIfRequired" and "setAllowMutate".
        • LA knows all of its states, vs A which "defines" its states as
          those reachable from initial. This means LA can have different
          kinds of dead states than A (can reach an accept state but not
          reachable from the initial state)
        • LA doesn't have mutable state, e.g. get/set/clearNumberedStates;
          states are already numbered as they are created (and only exist as
          ints). There is no "sortTransitions"/reduce: LA's transitions are
          already reduced/sorted as they are created
        • Initial state is always 0
        • There's no special casing for singletons; it's just a normal LA
        • No setMinimizeAlways

        Unfortunately this means operations that used to "just link states
        together," like concatenate, now do a full copy of the incoming
        automata ... so the problem here is these restrictions may be too much
        for our usage. E.g. RegExp keeps chaining and chaining automata
        together... in some cases I think we can fix the usage to do the
        building directly, but other cases I'm not sure.

        Show
        Michael McCandless added a comment - LightAutomaton is very different from Automaton: It's more like String and A is more like StringBuilder: once LA is built you can't link it up to another LA just by changing a few transitions It's not mutable after it's created; operations like determinize, minimize, totalize, etc. don't happen "in place" anymore, and instead return a new LA. I'd really love to forbid calling e.g. determinize and not "using" its result since I could easily have caused bugs with this. This also means we also don't have ops like "cloneIfRequired" and "setAllowMutate". LA knows all of its states, vs A which "defines" its states as those reachable from initial. This means LA can have different kinds of dead states than A (can reach an accept state but not reachable from the initial state) LA doesn't have mutable state, e.g. get/set/clearNumberedStates; states are already numbered as they are created (and only exist as ints). There is no "sortTransitions"/reduce: LA's transitions are already reduced/sorted as they are created Initial state is always 0 There's no special casing for singletons; it's just a normal LA No setMinimizeAlways Unfortunately this means operations that used to "just link states together," like concatenate, now do a full copy of the incoming automata ... so the problem here is these restrictions may be too much for our usage. E.g. RegExp keeps chaining and chaining automata together... in some cases I think we can fix the usage to do the building directly, but other cases I'm not sure.
        Hide
        ASF subversion and git services added a comment -

        Commit 1602623 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1602623 ]

        LUCENE-5752: track isDeterministic privately in LA; also remove states not reachable from initial node in removeDeadStates; move prefix handling into LevA

        Show
        ASF subversion and git services added a comment - Commit 1602623 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1602623 ] LUCENE-5752 : track isDeterministic privately in LA; also remove states not reachable from initial node in removeDeadStates; move prefix handling into LevA
        Hide
        ASF subversion and git services added a comment -

        Commit 1602966 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1602966 ]

        LUCENE-5752: improve tests; move isEmpty out of LA into BasicOps; BasicOps.sameLanguage requires no dead states; rename LA.finish -> finishState

        Show
        ASF subversion and git services added a comment - Commit 1602966 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1602966 ] LUCENE-5752 : improve tests; move isEmpty out of LA into BasicOps; BasicOps.sameLanguage requires no dead states; rename LA.finish -> finishState
        Hide
        ASF subversion and git services added a comment -

        Commit 1602968 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1602968 ]

        LUCENE-5752: merge trunk

        Show
        ASF subversion and git services added a comment - Commit 1602968 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1602968 ] LUCENE-5752 : merge trunk
        Hide
        ASF subversion and git services added a comment -

        Commit 1603012 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1603012 ]

        LUCENE-5752: renames

        Show
        ASF subversion and git services added a comment - Commit 1603012 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1603012 ] LUCENE-5752 : renames
        Hide
        ASF subversion and git services added a comment -

        Commit 1603106 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1603106 ]

        LUCENE-5752: remove last nocommits; turn on method-level javadocs for oal.util.automaton; improve tests

        Show
        ASF subversion and git services added a comment - Commit 1603106 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1603106 ] LUCENE-5752 : remove last nocommits; turn on method-level javadocs for oal.util.automaton; improve tests
        Hide
        ASF subversion and git services added a comment -

        Commit 1603107 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1603107 ]

        LUCENE-5752: merge trunk

        Show
        ASF subversion and git services added a comment - Commit 1603107 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1603107 ] LUCENE-5752 : merge trunk
        Hide
        ASF subversion and git services added a comment -

        Commit 1603116 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1603116 ]

        LUCENE-5752: minor cleanups / remove imports / sops, etc.

        Show
        ASF subversion and git services added a comment - Commit 1603116 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1603116 ] LUCENE-5752 : minor cleanups / remove imports / sops, etc.
        Hide
        Michael McCandless added a comment -

        I think this is close to ready; here's an initial applyable patch for trunk (patch -p1 < ...) ... I'm going to review it more closely but I think it's close.

        Show
        Michael McCandless added a comment - I think this is close to ready; here's an initial applyable patch for trunk (patch -p1 < ...) ... I'm going to review it more closely but I think it's close.
        Hide
        Michael McCandless added a comment -

        I'd like to commit to trunk first and let things bake for a while .... and then later backport after 4.9 is out.

        Show
        Michael McCandless added a comment - I'd like to commit to trunk first and let things bake for a while .... and then later backport after 4.9 is out.
        Hide
        Robert Muir added a comment -

        I think the tests and docs etc look great here. So I really like that patch, I'm just worried about just a few minor things:

        • concatenate: as mentioned before, we rely on this today in quite a few places, and now the runtime has significantly changed (when the left side is a singleton)
        • singleton: speaking of such, this optimization is removed, but are we sure about this? In practice this is probably extremely effective, maybe even outweighing any other optimizations we could do.
        • regex/wildcard parsing: we should really test that this isn't totally crazy (read: blowing up) now.
        • acceptStates: should this really be a hashset? is there a reason not to use a bitset?
        Show
        Robert Muir added a comment - I think the tests and docs etc look great here. So I really like that patch, I'm just worried about just a few minor things: concatenate: as mentioned before, we rely on this today in quite a few places, and now the runtime has significantly changed (when the left side is a singleton) singleton: speaking of such, this optimization is removed, but are we sure about this? In practice this is probably extremely effective, maybe even outweighing any other optimizations we could do. regex/wildcard parsing: we should really test that this isn't totally crazy (read: blowing up) now. acceptStates: should this really be a hashset? is there a reason not to use a bitset?
        Hide
        Michael McCandless added a comment -

        Thanks Rob.

        concatenate: as mentioned before, we rely on this today in quite a few places, and now the runtime has significantly changed (when the left side is a singleton)

        Well, in RegExp we followup that concatenate with a minimize. In
        WildcardQuery the incoming automata are small anyway... and I fixed
        LevA to insert the prefix itself to avoid the full copy of the fuzzy
        suffix part..

        singleton: speaking of such, this optimization is removed, but are we sure about this? In practice this is probably extremely effective, maybe even outweighing any other optimizations we could do.

        I really didn't like this duality / mutability (how you sometimes had
        to call expandSingleton for ops that cared) and I don't see where this
        opto would really make a difference in Lucene. We already have
        DaciukMihov to efficiently build minimal union automaton ...

        I agree for a general purpose automaton library this might make sense
        ... but I don't think it really helps Lucene.

        regex/wildcard parsing: we should really test that this isn't totally crazy (read: blowing up) now.

        I was worried about this too but when I looked at RegExp it calls
        minimize after all of these ops! So I think the added cost of the
        copy is likely in the noise ...

        acceptStates: should this really be a hashset? is there a reason not to use a bitset?

        Hmm it could be a bitset. I thought that typically the number of
        accept states is small, but I agree in the case when it's large it'd
        be nice to not use way way too much RAM ... I'll change it to bitset.

        Show
        Michael McCandless added a comment - Thanks Rob. concatenate: as mentioned before, we rely on this today in quite a few places, and now the runtime has significantly changed (when the left side is a singleton) Well, in RegExp we followup that concatenate with a minimize. In WildcardQuery the incoming automata are small anyway... and I fixed LevA to insert the prefix itself to avoid the full copy of the fuzzy suffix part.. singleton: speaking of such, this optimization is removed, but are we sure about this? In practice this is probably extremely effective, maybe even outweighing any other optimizations we could do. I really didn't like this duality / mutability (how you sometimes had to call expandSingleton for ops that cared) and I don't see where this opto would really make a difference in Lucene. We already have DaciukMihov to efficiently build minimal union automaton ... I agree for a general purpose automaton library this might make sense ... but I don't think it really helps Lucene. regex/wildcard parsing: we should really test that this isn't totally crazy (read: blowing up) now. I was worried about this too but when I looked at RegExp it calls minimize after all of these ops! So I think the added cost of the copy is likely in the noise ... acceptStates: should this really be a hashset? is there a reason not to use a bitset? Hmm it could be a bitset. I thought that typically the number of accept states is small, but I agree in the case when it's large it'd be nice to not use way way too much RAM ... I'll change it to bitset.
        Hide
        Michael McCandless added a comment -

        I ran the normal luceneutil bench on wikimediumall:

                            Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                HighSloppyPhrase        3.60      (7.3%)        3.51      (7.4%)   -2.3% ( -15% -   13%)
                      HighPhrase        4.59      (5.5%)        4.54      (6.2%)   -1.1% ( -12% -   11%)
                 MedSloppyPhrase        3.59      (3.8%)        3.55      (4.8%)   -1.0% (  -9% -    7%)
                        HighTerm       63.28      (3.7%)       62.65      (4.3%)   -1.0% (  -8% -    7%)
                         MedTerm       99.13      (3.0%)       98.15      (3.6%)   -1.0% (  -7% -    5%)
                       MedPhrase      231.08      (5.4%)      229.10      (6.3%)   -0.9% ( -11% -   11%)
                        PKLookup      160.27      (2.4%)      159.32      (2.9%)   -0.6% (  -5% -    4%)
                         LowTerm      323.40      (1.9%)      321.68      (2.3%)   -0.5% (  -4% -    3%)
                 LowSloppyPhrase       45.04      (1.5%)       44.81      (2.2%)   -0.5% (  -4% -    3%)
                      AndHighLow      413.85      (1.6%)      412.50      (2.4%)   -0.3% (  -4% -    3%)
                     LowSpanNear       11.23      (3.6%)       11.20      (3.2%)   -0.2% (  -6% -    6%)
                    HighSpanNear       10.36      (5.3%)       10.33      (4.7%)   -0.2% (  -9% -   10%)
                     MedSpanNear       34.23      (3.0%)       34.16      (3.0%)   -0.2% (  -6% -    5%)
                     AndHighHigh       28.81      (0.6%)       28.79      (0.8%)   -0.1% (  -1% -    1%)
                       LowPhrase       13.51      (2.2%)       13.50      (1.6%)   -0.1% (  -3% -    3%)
                      AndHighMed       34.92      (0.5%)       34.90      (0.9%)   -0.1% (  -1% -    1%)
                          IntNRQ        3.45      (6.6%)        3.45      (6.3%)   -0.1% ( -12% -   13%)
                         Prefix3       93.31      (4.3%)       93.26      (3.5%)   -0.0% (  -7% -    8%)
                        Wildcard       20.15      (4.1%)       20.15      (2.8%)   -0.0% (  -6% -    7%)
                         Respell       49.36      (3.1%)       49.52      (2.8%)    0.3% (  -5% -    6%)
                   OrNotHighHigh       10.73      (6.3%)       10.81      (6.4%)    0.8% ( -11% -   14%)
                      OrHighHigh        9.89      (6.4%)        9.97      (6.3%)    0.8% ( -11% -   14%)
                       OrHighMed       31.86      (6.3%)       32.12      (6.4%)    0.8% ( -11% -   14%)
                   OrHighNotHigh       13.39      (6.1%)       13.51      (6.1%)    0.8% ( -10% -   13%)
                    OrHighNotMed       36.55      (5.9%)       36.88      (6.1%)    0.9% ( -10% -   13%)
                    OrNotHighMed       23.44      (6.4%)       23.65      (6.6%)    0.9% ( -11% -   14%)
                       OrHighLow       22.74      (6.6%)       22.96      (6.9%)    1.0% ( -11% -   15%)
                    OrNotHighLow       24.38      (6.7%)       24.62      (6.8%)    1.0% ( -11% -   15%)
                    OrHighNotLow       29.97      (6.5%)       30.29      (6.9%)    1.0% ( -11% -   15%)
                          Fuzzy2       45.50      (3.2%)       46.05      (3.1%)    1.2% (  -5% -    7%)
                          Fuzzy1       60.83      (4.0%)       61.71      (3.9%)    1.4% (  -6% -    9%)
        

        Net/net I think this is noise ... but Rob pointed out the Fuzzy1/2
        tasks here don't do a prefix, so I'll fixup luceneutil to support that
        and test it too.

        Show
        Michael McCandless added a comment - I ran the normal luceneutil bench on wikimediumall: Task QPS base StdDev QPS comp StdDev Pct diff HighSloppyPhrase 3.60 (7.3%) 3.51 (7.4%) -2.3% ( -15% - 13%) HighPhrase 4.59 (5.5%) 4.54 (6.2%) -1.1% ( -12% - 11%) MedSloppyPhrase 3.59 (3.8%) 3.55 (4.8%) -1.0% ( -9% - 7%) HighTerm 63.28 (3.7%) 62.65 (4.3%) -1.0% ( -8% - 7%) MedTerm 99.13 (3.0%) 98.15 (3.6%) -1.0% ( -7% - 5%) MedPhrase 231.08 (5.4%) 229.10 (6.3%) -0.9% ( -11% - 11%) PKLookup 160.27 (2.4%) 159.32 (2.9%) -0.6% ( -5% - 4%) LowTerm 323.40 (1.9%) 321.68 (2.3%) -0.5% ( -4% - 3%) LowSloppyPhrase 45.04 (1.5%) 44.81 (2.2%) -0.5% ( -4% - 3%) AndHighLow 413.85 (1.6%) 412.50 (2.4%) -0.3% ( -4% - 3%) LowSpanNear 11.23 (3.6%) 11.20 (3.2%) -0.2% ( -6% - 6%) HighSpanNear 10.36 (5.3%) 10.33 (4.7%) -0.2% ( -9% - 10%) MedSpanNear 34.23 (3.0%) 34.16 (3.0%) -0.2% ( -6% - 5%) AndHighHigh 28.81 (0.6%) 28.79 (0.8%) -0.1% ( -1% - 1%) LowPhrase 13.51 (2.2%) 13.50 (1.6%) -0.1% ( -3% - 3%) AndHighMed 34.92 (0.5%) 34.90 (0.9%) -0.1% ( -1% - 1%) IntNRQ 3.45 (6.6%) 3.45 (6.3%) -0.1% ( -12% - 13%) Prefix3 93.31 (4.3%) 93.26 (3.5%) -0.0% ( -7% - 8%) Wildcard 20.15 (4.1%) 20.15 (2.8%) -0.0% ( -6% - 7%) Respell 49.36 (3.1%) 49.52 (2.8%) 0.3% ( -5% - 6%) OrNotHighHigh 10.73 (6.3%) 10.81 (6.4%) 0.8% ( -11% - 14%) OrHighHigh 9.89 (6.4%) 9.97 (6.3%) 0.8% ( -11% - 14%) OrHighMed 31.86 (6.3%) 32.12 (6.4%) 0.8% ( -11% - 14%) OrHighNotHigh 13.39 (6.1%) 13.51 (6.1%) 0.8% ( -10% - 13%) OrHighNotMed 36.55 (5.9%) 36.88 (6.1%) 0.9% ( -10% - 13%) OrNotHighMed 23.44 (6.4%) 23.65 (6.6%) 0.9% ( -11% - 14%) OrHighLow 22.74 (6.6%) 22.96 (6.9%) 1.0% ( -11% - 15%) OrNotHighLow 24.38 (6.7%) 24.62 (6.8%) 1.0% ( -11% - 15%) OrHighNotLow 29.97 (6.5%) 30.29 (6.9%) 1.0% ( -11% - 15%) Fuzzy2 45.50 (3.2%) 46.05 (3.1%) 1.2% ( -5% - 7%) Fuzzy1 60.83 (4.0%) 61.71 (3.9%) 1.4% ( -6% - 9%) Net/net I think this is noise ... but Rob pointed out the Fuzzy1/2 tasks here don't do a prefix, so I'll fixup luceneutil to support that and test it too.
        Hide
        ASF subversion and git services added a comment -

        Commit 1603489 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1603489 ]

        LUCENE-5752: cutover to FixedBitSet to mark the accept states; improve javadocs

        Show
        ASF subversion and git services added a comment - Commit 1603489 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1603489 ] LUCENE-5752 : cutover to FixedBitSet to mark the accept states; improve javadocs
        Hide
        Michael McCandless added a comment -

        I fixed the acceptStates to use a FixedBitSet, and made
        A.getAcceptStates package private.

        I temporarily set luceneutil's QueryParser to use prefix=1 and 2
        for FuzzyQuery and re-tested and saw no perf change ... results were
        noisy though.

        I ran the suggester benchmark on trunk vs patch (and hit
        LUCENE-5775) and the numbers are all very close.

        I also ran core tests (time ant test -Dtests.seed=0) on trunk vs
        branch and the time was 2 min 26 seconds on trunk, 2 min 25 seconds on
        branch.

        Net/net I think perf is fine. I think this is ready ... I'll post the
        latest applyable patch.

        Show
        Michael McCandless added a comment - I fixed the acceptStates to use a FixedBitSet, and made A.getAcceptStates package private. I temporarily set luceneutil's QueryParser to use prefix=1 and 2 for FuzzyQuery and re-tested and saw no perf change ... results were noisy though. I ran the suggester benchmark on trunk vs patch (and hit LUCENE-5775 ) and the numbers are all very close. I also ran core tests (time ant test -Dtests.seed=0) on trunk vs branch and the time was 2 min 26 seconds on trunk, 2 min 25 seconds on branch. Net/net I think perf is fine. I think this is ready ... I'll post the latest applyable patch.
        Hide
        ASF subversion and git services added a comment -

        Commit 1603492 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1603492 ]

        LUCENE-5752: merge trunk

        Show
        ASF subversion and git services added a comment - Commit 1603492 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1603492 ] LUCENE-5752 : merge trunk
        Hide
        Michael McCandless added a comment -

        Applyable patch.

        Show
        Michael McCandless added a comment - Applyable patch.
        Hide
        ASF subversion and git services added a comment -

        Commit 1603516 from Michael McCandless in branch 'dev/branches/lucene5752'
        [ https://svn.apache.org/r1603516 ]

        LUCENE-5752: use BitSet for accept states

        Show
        ASF subversion and git services added a comment - Commit 1603516 from Michael McCandless in branch 'dev/branches/lucene5752' [ https://svn.apache.org/r1603516 ] LUCENE-5752 : use BitSet for accept states
        Hide
        ASF subversion and git services added a comment -

        Commit 1603752 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1603752 ]

        LUCENE-5752: switch to simpler Automaton implementation

        Show
        ASF subversion and git services added a comment - Commit 1603752 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1603752 ] LUCENE-5752 : switch to simpler Automaton implementation
        Hide
        ASF subversion and git services added a comment -

        Commit 1604283 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1604283 ]

        LUCENE-5752: switch to simpler, immutable Automaton implementation

        Show
        ASF subversion and git services added a comment - Commit 1604283 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1604283 ] LUCENE-5752 : switch to simpler, immutable Automaton implementation

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development