Lucene - Core
  1. Lucene - Core
  2. LUCENE-6548

Optimize BlockTree's Terms.intersect a bit for "very finite" automata

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I've been digging into why BlockTree's Terms.intersect impl is slower for a "very finite" union-of-terms test over random IDs (LUCENE-3893) and I found a few performance improvements that at least for that one use case gets a ~11% speedup.

      1. LUCENE-6548.patch
        54 kB
        Michael McCandless
      2. LUCENE-6548.patch
        28 kB
        Michael McCandless
      3. LUCENE-6548.patch
        28 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Initial patch ... I still need to run full benchmarks to make sure that "not so finite" LevN automata (FuzzyQuery) didn't get slower.

        The patch is bigger than it looks because I also factored out some methods from the way-too-big IntersectTermsEnum.next.

        The gist of the speedup is to avoid calling RunAutomaton.step on the lead byte of a term's suffix since we already know whether this byte matches the current transition.

        Show
        Michael McCandless added a comment - Initial patch ... I still need to run full benchmarks to make sure that "not so finite" LevN automata (FuzzyQuery) didn't get slower. The patch is bigger than it looks because I also factored out some methods from the way-too-big IntersectTermsEnum.next. The gist of the speedup is to avoid calling RunAutomaton.step on the lead byte of a term's suffix since we already know whether this byte matches the current transition.
        Hide
        Michael McCandless added a comment -

        New patch, with feedback from Rob to use single instance for the private "end" exception.

        Show
        Michael McCandless added a comment - New patch, with feedback from Rob to use single instance for the private "end" exception.
        Hide
        Michael McCandless added a comment -

        Another iteration, still work in progress ... this one adds an index format change to try to recover some of the perf loss when we added auto-prefix ...

        Show
        Michael McCandless added a comment - Another iteration, still work in progress ... this one adds an index format change to try to recover some of the perf loss when we added auto-prefix ...
        Hide
        Michael McCandless added a comment -

        I'll commit the last patch soon; I think there are other things we can do here, but we can explore them later and this is already a good speedup, at lease in the "very finite" (arcs that typically have only one label, not a range) automata intersect case.

        Show
        Michael McCandless added a comment - I'll commit the last patch soon; I think there are other things we can do here, but we can explore them later and this is already a good speedup, at lease in the "very finite" (arcs that typically have only one label, not a range) automata intersect case.
        Hide
        ASF subversion and git services added a comment -

        Commit 1687125 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1687125 ]

        LUCENE-6548: some optimizations to BlockTree's intersect method with very finite automata

        Show
        ASF subversion and git services added a comment - Commit 1687125 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1687125 ] LUCENE-6548 : some optimizations to BlockTree's intersect method with very finite automata
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development