Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Similar to LUCENE-2761:

      when we call advance(), after skipping it scans, but this can be optimized better than calling nextDoc() like today

            // scan for the rest:
            do {
              nextDoc();
            } while (target > doc);
      

      in particular, the freq can be "skipVinted" and the skipDocs (deletedDocs) don't need to be checked during this scanning.

      1. LUCENE-2765.patch
        11 kB
        Robert Muir
      2. LUCENE-2765.patch
        11 kB
        Robert Muir

        Activity

        Hide
        Uwe Schindler added a comment -

        Move issue to Lucene 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Lucene 4.9.
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Robert Muir added a comment -

        here is Mike's results on his wikipedia index (multi-segment, 5% deletions) with the patch.

        Query QPS base QPS spec Pct diff
        "unit state" 7.94 7.84 -1.3%
        state 36.15 35.81 -1.0%
        spanNear([unit, state], 10, true) 4.46 4.42 -0.9%
        spanFirst(unit, 5) 16.51 16.45 -0.4%
        unit state 10.76 10.78 0.1%
        unit~2.0 13.83 14.06 1.7%
        unit~1.0 14.36 14.69 2.3%
        uni* 15.57 16.02 2.9%
        unit* 27.29 28.26 3.5%
        +unit +state 11.73 12.31 4.9%
        united~1.0 29.01 30.86 6.4%
        un*d 66.52 70.99 6.7%
        u*d 21.29 22.98 7.9%
        united~2.0 6.48 7.07 9.1%
        +nebraska +state 169.87 188.95 11.2%
        Show
        Robert Muir added a comment - here is Mike's results on his wikipedia index (multi-segment, 5% deletions) with the patch. Query QPS base QPS spec Pct diff "unit state" 7.94 7.84 -1.3% state 36.15 35.81 -1.0% spanNear( [unit, state] , 10, true) 4.46 4.42 -0.9% spanFirst(unit, 5) 16.51 16.45 -0.4% unit state 10.76 10.78 0.1% unit~2.0 13.83 14.06 1.7% unit~1.0 14.36 14.69 2.3% uni* 15.57 16.02 2.9% unit* 27.29 28.26 3.5% +unit +state 11.73 12.31 4.9% united~1.0 29.01 30.86 6.4% un*d 66.52 70.99 6.7% u*d 21.29 22.98 7.9% united~2.0 6.48 7.07 9.1% +nebraska +state 169.87 188.95 11.2%
        Hide
        Robert Muir added a comment -

        i ran a quick very rough check, with AND query (3149 results for this query)...
        i didnt benchmark the omitTF case (but it should be better too)

        all times in milliseconds

            QueryParser qp = new QueryParser(Version.LUCENE_CURRENT, "body", new MockAnalyzer());
            Query query = qp.parse("+the +america");
            System.out.println(searcher.search(query, 10).totalHits);
            long ms = System.currentTimeMillis();
            for (int i = 0; i < 1000; i++) {
              searcher.search(query, 10);
            }
            long ms2 = System.currentTimeMillis();
            System.out.println("time = " + (ms2 - ms));
        
        setup run1 run2 run3 run4 run5 run6
        trunk 1707 1706 1709 1704 1704 1703
        LUCENE-2765 1628 1623 1641 1624 1627 1628

        seems worth it to me.

        Show
        Robert Muir added a comment - i ran a quick very rough check, with AND query (3149 results for this query)... i didnt benchmark the omitTF case (but it should be better too) all times in milliseconds QueryParser qp = new QueryParser(Version.LUCENE_CURRENT, "body" , new MockAnalyzer()); Query query = qp.parse( "+the +america" ); System .out.println(searcher.search(query, 10).totalHits); long ms = System .currentTimeMillis(); for ( int i = 0; i < 1000; i++) { searcher.search(query, 10); } long ms2 = System .currentTimeMillis(); System .out.println( "time = " + (ms2 - ms)); setup run1 run2 run3 run4 run5 run6 trunk 1707 1706 1709 1704 1704 1703 LUCENE-2765 1628 1623 1641 1624 1627 1628 seems worth it to me.
        Hide
        Robert Muir added a comment -

        my mistake, i left an extra check in the code... here's the updated one.

        Show
        Robert Muir added a comment - my mistake, i left an extra check in the code... here's the updated one.
        Hide
        Robert Muir added a comment -

        here's a patch, maybe can be beautified/optimized further.

        needs benchmarking.

        Show
        Robert Muir added a comment - here's a patch, maybe can be beautified/optimized further. needs benchmarking.
        Hide
        Robert Muir added a comment -

        Also, another idea like LUCENE-2761 is to specialize the omitTF case here...

        Show
        Robert Muir added a comment - Also, another idea like LUCENE-2761 is to specialize the omitTF case here...

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development