Lucene - Core
  1. Lucene - Core
  2. LUCENE-1594

Use source code specialization to maximize search performance

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Towards eeking absolute best search performance, and after seeing the
      Java ghosts in LUCENE-1575, I decided to build a simple prototype
      source code specializer for Lucene's searches.

      The idea is to write dynamic Java code, specialized to run a very
      specific query context (eg TermQuery, collecting top N by field, no
      filter, no deletions), compile that Java code, and run it.

      Here're the performance gains when compared to trunk:

      Query Sort Filt Deletes Scoring Hits QPS (base) QPS (new) %
      1 Date (long) no no Track,Max 2561886 6.8 10.6 55.9%
      1 Date (long) no 5% Track,Max 2433472 6.3 10.5 66.7%
      1 Date (long) 25% no Track,Max 640022 5.2 9.9 90.4%
      1 Date (long) 25% 5% Track,Max 607949 5.3 10.3 94.3%
      1 Date (long) 10% no Track,Max 256300 6.7 12.3 83.6%
      1 Date (long) 10% 5% Track,Max 243317 6.6 12.6 90.9%
      1 Relevance no no Track,Max 2561886 11.2 17.3 54.5%
      1 Relevance no 5% Track,Max 2433472 10.1 15.7 55.4%
      1 Relevance 25% no Track,Max 640022 6.1 14.1 131.1%
      1 Relevance 25% 5% Track,Max 607949 6.2 14.4 132.3%
      1 Relevance 10% no Track,Max 256300 7.7 15.6 102.6%
      1 Relevance 10% 5% Track,Max 243317 7.6 15.9 109.2%
      1 Title (string) no no Track,Max 2561886 7.8 12.5 60.3%
      1 Title (string) no 5% Track,Max 2433472 7.5 11.1 48.0%
      1 Title (string) 25% no Track,Max 640022 5.7 11.2 96.5%
      1 Title (string) 25% 5% Track,Max 607949 5.5 11.3 105.5%
      1 Title (string) 10% no Track,Max 256300 7.0 12.7 81.4%
      1 Title (string) 10% 5% Track,Max 243317 6.7 13.2 97.0%

      Those tests were run on a 19M doc wikipedia index (splitting each
      Wikipedia doc @ ~1024 chars), on Linux, Java 1.6.0_10

      But: it only works with TermQuery for now; it's just a start.

      It should be easy for others to run this test:

      • apply patch
      • cd contrib/benchmark
      • run python -u bench.py -delindex </path/to/index/with/deletes>
        -nodelindex </path/to/index/without/deletes>

      (You can leave off one of -delindex or -nodelindex and it'll skip
      those tests).

      For each test, bench.py generates a single Java source file that runs
      that one query; you can open
      contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/FastSearchTask.java
      to see it. I'll attach an example. It writes "results.txt", in Jira
      table format, which you should be able to copy/paste back here.

      The specializer uses pretty much every search speedup I can think of
      – the ones from LUCENE-1575 (to score or not, to maxScore or not),
      the ones suggested in the spinoff LUCENE-1593 (pre-fill w/ sentinels,
      don't use docID for tie breaking), LUCENE-1536 (random access
      filters). It bypasses TermDocs and interacts directly with the
      IndexInput, and with BitVector for deletions. It directly folds in
      the collector, if possible. A filter if used must be random access,
      and is assumed to pre-multiply-in the deleted docs.

      Current status:

      • I only handle TermQuery. I'd like to add others over time...
      • It can collect by score, or single field (with the 3 scoring
        options in LUCENE-1575). It can't do reverse field sort nor
        multi-field sort now.
      • The auto-gen code (gen.py) is rather hideous. It could use some
        serious refactoring, etc.; I think we could get it to the point
        where each Query can gen its own specialized code, maybe. It also
        needs to be eventually ported to Java.
      • The script runs old, then new, then checks that the topN results
        are identical, and aborts if not. So I'm pretty sure the
        specialized code is working correctly, for the cases I'm testing.
      • The patch includes a few small changes to core, mostly to open up
        package protected APIs so I can access stuff

      I think this is an interesting effort for several reasons:

      • It gives us a best-case upper bound performance we can expect from
        Lucene's normal search classes (minus algorithmic improvements eg
        PFOR) because it makes life as easy as possible on the
        compiler/JRE to convert to assembly.
      • We can spin out optimization ideas from this back into the core
        (eg LUCENE-1593 already has one example), and prioritize. EG I
        think given these results, optimizing for filters that support
        random-access API is important. As we fold speedups back into
        core, the gains from specialization will naturally decrease.
      • Eventually (maybe, eg as a future "experimental" module) this can
        be used in production as a simple "search wrapper". Ie, for a
        given query, the specializer is checked. If the query "matches"
        what the specializer can handle, then the specialized code is run;
        else we fallback to Lucene core. Likely one would pre-compile the
        space of all specializations, or we could compile java-on-the-fly
        (eg what a JSP source does when it's changed) but I'm not sure how
        costly/portable that is.
      1. LUCENE-1594.patch
        51 kB
        Michael McCandless
      2. FastSearchTask.java
        5 kB
        Michael McCandless
      3. LUCENE-1594.patch
        1.65 MB
        Michael McCandless
      4. LUCENE-1594.patch
        71 kB
        Michael McCandless
      5. LUCENE-1594.patch
        131 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Initial patch.

        Show
        Michael McCandless added a comment - Initial patch.
        Hide
        Michael McCandless added a comment -

        Example of what the specialized code looks like. This one sorts by relevance, and does no filter & no deletions.

        Show
        Michael McCandless added a comment - Example of what the specialized code looks like. This one sorts by relevance, and does no filter & no deletions.
        Hide
        Michael McCandless added a comment -

        Another iteration... this patch is very large because I'm including
        all the generated classes. Some changes:

        • I moved everying under a new contrib/spec
        • There is now a simple "FastSearch" class that you can use to run
          searches. This makes it very simple to try out – if the
          specializer can handle it, it will; else it falls back to
          IndexSearcher's search methods.
        • Two-term boolean OR query is now covered
        • String (ord/val) search is now specialized

        The code is still horrific and there are many cases not handled. Very
        early in the iterations still...
        (reversed, multi-field, other query types, etc.).

        Show
        Michael McCandless added a comment - Another iteration... this patch is very large because I'm including all the generated classes. Some changes: I moved everying under a new contrib/spec There is now a simple "FastSearch" class that you can use to run searches. This makes it very simple to try out – if the specializer can handle it, it will; else it falls back to IndexSearcher's search methods. Two-term boolean OR query is now covered String (ord/val) search is now specialized The code is still horrific and there are many cases not handled. Very early in the iterations still... (reversed, multi-field, other query types, etc.).
        Hide
        Michael McCandless added a comment -

        New patch attached:

        • Specialize for the no norms cases
        • N-clause BooleanQuery of TermQuerys now handled
        • Handle "setMinimumNumberShouldMatch"
        • MUST_NOT clauses handled
        • Allow total hits to NOT be computed, and then when sorting by
          field, do a fail fast on a doc while iterating the TermDocs if the
          doc can't compete in the current PQ (discussed under LUCENE-1593)
        • Pre-replace nulls with U+0000 in StringIndex
        • Other random optimizations

        Patch is small because I'm not including all generated sources (there
        are too many).

        This patch always pre-fills the queue w/ sentinel values.

        These optimizations result is very sizable performance gains,
        especially with OR queries that sort by field, do not require total
        hit count (with or without filtering, deletions, scoring, etc.). In
        these cases the specialized code runs 2.5-3.5X faster than Lucene
        core.

        Show
        Michael McCandless added a comment - New patch attached: Specialize for the no norms cases N-clause BooleanQuery of TermQuerys now handled Handle "setMinimumNumberShouldMatch" MUST_NOT clauses handled Allow total hits to NOT be computed, and then when sorting by field, do a fail fast on a doc while iterating the TermDocs if the doc can't compete in the current PQ (discussed under LUCENE-1593 ) Pre-replace nulls with U+0000 in StringIndex Other random optimizations Patch is small because I'm not including all generated sources (there are too many). This patch always pre-fills the queue w/ sentinel values. These optimizations result is very sizable performance gains, especially with OR queries that sort by field, do not require total hit count (with or without filtering, deletions, scoring, etc.). In these cases the specialized code runs 2.5-3.5X faster than Lucene core.
        Hide
        Eks Dev added a comment -

        huh, it reduces hardware costs 2-3 times for larger setup! great

        Show
        Eks Dev added a comment - huh, it reduces hardware costs 2-3 times for larger setup! great
        Hide
        Michael McCandless added a comment -

        Another iteration. Many changes, eg:

        • All BooleanQuery clauses are now handled (MUST, SHOULD,
          MUST_NOT). But clauses must still be TermQuery.
        • Both chunk (like BooleanScorer) and "doc at once" (like
          BooleanScorer2) scoring can be generated, except "doc at once"
          requires at least one MUST clause (or non-random-access filter).
        • Iteration only filters are rewritten as a MUST clause on a
          BooleanQuery
        • Decoupled "structural optimizations" (rewriting BooleanQuery to
          equivalent faster queries) from searching/specializing; much of
          this is the same as what BooleanQuery.scorer does, but some is new
          eg I drop all SHOULD clauses if minNR is 0 and no scoring is
          done.
        • The gen code (Python) is getting somewhat cleaner but still far
          from porting to Java.
        • Created a unit test (TestSpecializedSearch) that enumerates all
          the specializations that can be handled, confirming that results
          from core Lucene match the specializer's results
        • Many micro optimizations
        Show
        Michael McCandless added a comment - Another iteration. Many changes, eg: All BooleanQuery clauses are now handled (MUST, SHOULD, MUST_NOT). But clauses must still be TermQuery. Both chunk (like BooleanScorer) and "doc at once" (like BooleanScorer2) scoring can be generated, except "doc at once" requires at least one MUST clause (or non-random-access filter). Iteration only filters are rewritten as a MUST clause on a BooleanQuery Decoupled "structural optimizations" (rewriting BooleanQuery to equivalent faster queries) from searching/specializing; much of this is the same as what BooleanQuery.scorer does, but some is new eg I drop all SHOULD clauses if minNR is 0 and no scoring is done. The gen code (Python) is getting somewhat cleaner but still far from porting to Java. Created a unit test (TestSpecializedSearch) that enumerates all the specializations that can be handled, confirming that results from core Lucene match the specializer's results Many micro optimizations
        Hide
        Erick Erickson added a comment -

        SPRING_CLEANING_2013. We can reopen if there's interest.

        Hey Mike:

        Guessing this has been superseded by all the work that's gone on since you opened this one almost 4 years ago.....

        Show
        Erick Erickson added a comment - SPRING_CLEANING_2013. We can reopen if there's interest. Hey Mike: Guessing this has been superseded by all the work that's gone on since you opened this one almost 4 years ago.....

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development