Lucene - Core
  1. Lucene - Core
  2. LUCENE-1997

Explore performance of multi-PQ vs single-PQ sorting API

    Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.9
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
      where a simpler (non-segment-based) comparator API is proposed that
      gathers results into multiple PQs (one per segment) and then merges
      them in the end.

      I started from John's multi-PQ code and worked it into
      contrib/benchmark so that we could run perf tests. Then I generified
      the Python script I use for running search benchmarks (in
      contrib/benchmark/sortBench.py).

      The script first creates indexes with 1M docs (based on
      SortableSingleDocSource, and based on wikipedia, if available). Then
      it runs various combinations:

      • Index with 20 balanced segments vs index with the "normal" log
        segment size
      • Queries with different numbers of hits (only for wikipedia index)
      • Different top N
      • Different sorts (by title, for wikipedia, and by random string,
        random int, and country for the random index)

      For each test, 7 search rounds are run and the best QPS is kept. The
      script runs singlePQ then multiPQ, and records the resulting best QPS
      for each and produces table (in Jira format) as output.

      1. LUCENE-1997.patch
        59 kB
        Michael McCandless
      2. LUCENE-1997.patch
        49 kB
        Michael McCandless
      3. LUCENE-1997.patch
        48 kB
        Michael McCandless
      4. LUCENE-1997.patch
        46 kB
        Michael McCandless
      5. LUCENE-1997.patch
        46 kB
        Michael McCandless
      6. LUCENE-1997.patch
        45 kB
        Michael McCandless
      7. LUCENE-1997.patch
        45 kB
        Michael McCandless
      8. LUCENE-1997.patch
        40 kB
        Michael McCandless
      9. LUCENE-1997.patch
        39 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Attached patch.

        Note that patch is based on 2.9.x branch, so first checkout 2.9.x,
        apply the patch, then:

        cd contrib/benchmark
        ant compile
        <edit constants @ top of sortBench.py>
        python -u sortBench.py -run results
        python -u sortBench.py -report results

        The important constants are INDEX_DIR_BASE (where created indexes are
        stored), WIKI_FILE (points to .tar.bz2 or .tar export of wikipedia; if
        this file can't be found the script just skips the wikipedia tests).
        You can also change INDEX_NUM_DOCS and INDEX_NUM_THREADS.

        If you don't have the wiki export downloaded, that's fine... the
        script should just run the tests based on the random index.

        Show
        Michael McCandless added a comment - Attached patch. Note that patch is based on 2.9.x branch, so first checkout 2.9.x, apply the patch, then: cd contrib/benchmark ant compile <edit constants @ top of sortBench.py> python -u sortBench.py -run results python -u sortBench.py -report results The important constants are INDEX_DIR_BASE (where created indexes are stored), WIKI_FILE (points to .tar.bz2 or .tar export of wikipedia; if this file can't be found the script just skips the wikipedia tests). You can also change INDEX_NUM_DOCS and INDEX_NUM_THREADS. If you don't have the wiki export downloaded, that's fine... the script should just run the tests based on the random index.
        Hide
        Michael McCandless added a comment -

        OK I ran sortBench.py on opensolaris 2009.06 box, Java 1.6.0_13.

        It'd be great if others with more mainstream platforms (Linux,
        Windows) could run this and post back.

        Raw results (only ran on the log-sized segments):

        Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        log 1 318481 title 10 114.26 112.40 -1.6%
        log 1 318481 title 25 117.59 110.08 -6.4%
        log 1 318481 title 50 116.22 106.96 -8.0%
        log 1 318481 title 100 114.48 100.07 -12.6%
        log 1 318481 title 500 103.16 73.98 -28.3%
        log 1 318481 title 1000 95.60 57.85 -39.5%
        log <all> 1000000 title 10 95.71 109.41 14.3%
        log <all> 1000000 title 25 111.56 101.73 -8.8%
        log <all> 1000000 title 50 110.56 98.84 -10.6%
        log <all> 1000000 title 100 104.09 93.02 -10.6%
        log <all> 1000000 title 500 93.36 66.67 -28.6%
        log <all> 1000000 title 1000 97.07 50.03 -48.5%
        log <all> 1000000 rand string 10 118.10 109.63 -7.2%
        log <all> 1000000 rand string 25 107.68 102.33 -5.0%
        log <all> 1000000 rand string 50 107.12 100.37 -6.3%
        log <all> 1000000 rand string 100 110.63 95.17 -14.0%
        log <all> 1000000 rand string 500 79.97 72.09 -9.9%
        log <all> 1000000 rand string 1000 76.82 54.67 -28.8%
        log <all> 1000000 country 10 129.49 103.63 -20.0%
        log <all> 1000000 country 25 111.74 102.60 -8.2%
        log <all> 1000000 country 50 108.82 100.90 -7.3%
        log <all> 1000000 country 100 108.01 96.84 -10.3%
        log <all> 1000000 country 500 97.60 72.02 -26.2%
        log <all> 1000000 country 1000 85.19 54.56 -36.0%
        log <all> 1000000 rand int 10 151.75 110.37 -27.3%
        log <all> 1000000 rand int 25 138.06 109.15 -20.9%
        log <all> 1000000 rand int 50 135.40 106.49 -21.4%
        log <all> 1000000 rand int 100 108.30 101.86 -5.9%
        log <all> 1000000 rand int 500 94.45 73.42 -22.3%
        log <all> 1000000 rand int 1000 88.30 54.71 -38.0%

        Some observations:

        • MultiPQ seems like it's generally slower, thought it is faster in
          one case, when topN = 10, sorting by title. It's only faster with
          the : (MatchAllDocsQuery) query, not with the TermQuery for
          term=1, which is odd.
        • MultiPQ slows down, relatively, as topN increases.
        • Sorting by int acts differently: MultiPQ is quite a bit slower
          across the board, except for topN=100
        Show
        Michael McCandless added a comment - OK I ran sortBench.py on opensolaris 2009.06 box, Java 1.6.0_13. It'd be great if others with more mainstream platforms (Linux, Windows) could run this and post back. Raw results (only ran on the log-sized segments): Seg size Query Tot hits Sort Top N QPS old QPS new Pct change log 1 318481 title 10 114.26 112.40 -1.6% log 1 318481 title 25 117.59 110.08 -6.4% log 1 318481 title 50 116.22 106.96 -8.0% log 1 318481 title 100 114.48 100.07 -12.6% log 1 318481 title 500 103.16 73.98 -28.3% log 1 318481 title 1000 95.60 57.85 -39.5% log <all> 1000000 title 10 95.71 109.41 14.3% log <all> 1000000 title 25 111.56 101.73 -8.8% log <all> 1000000 title 50 110.56 98.84 -10.6% log <all> 1000000 title 100 104.09 93.02 -10.6% log <all> 1000000 title 500 93.36 66.67 -28.6% log <all> 1000000 title 1000 97.07 50.03 -48.5% log <all> 1000000 rand string 10 118.10 109.63 -7.2% log <all> 1000000 rand string 25 107.68 102.33 -5.0% log <all> 1000000 rand string 50 107.12 100.37 -6.3% log <all> 1000000 rand string 100 110.63 95.17 -14.0% log <all> 1000000 rand string 500 79.97 72.09 -9.9% log <all> 1000000 rand string 1000 76.82 54.67 -28.8% log <all> 1000000 country 10 129.49 103.63 -20.0% log <all> 1000000 country 25 111.74 102.60 -8.2% log <all> 1000000 country 50 108.82 100.90 -7.3% log <all> 1000000 country 100 108.01 96.84 -10.3% log <all> 1000000 country 500 97.60 72.02 -26.2% log <all> 1000000 country 1000 85.19 54.56 -36.0% log <all> 1000000 rand int 10 151.75 110.37 -27.3% log <all> 1000000 rand int 25 138.06 109.15 -20.9% log <all> 1000000 rand int 50 135.40 106.49 -21.4% log <all> 1000000 rand int 100 108.30 101.86 -5.9% log <all> 1000000 rand int 500 94.45 73.42 -22.3% log <all> 1000000 rand int 1000 88.30 54.71 -38.0% Some observations: MultiPQ seems like it's generally slower, thought it is faster in one case, when topN = 10, sorting by title. It's only faster with the : (MatchAllDocsQuery) query, not with the TermQuery for term=1, which is odd. MultiPQ slows down, relatively, as topN increases. Sorting by int acts differently: MultiPQ is quite a bit slower across the board, except for topN=100
        Hide
        Michael McCandless added a comment -

        New patch attached:

        • Turn off testing on the balanced index by default (set DO_BALANCED to True if you want to change this)
        • Minor formatting fixes in generating the report
        Show
        Michael McCandless added a comment - New patch attached: Turn off testing on the balanced index by default (set DO_BALANCED to True if you want to change this) Minor formatting fixes in generating the report
        Hide
        Mark Miller added a comment -

        Results from John Wang:

        Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        log <all> 1000000 rand string 10 91.76 108.63 18.4%
        log <all> 1000000 rand string 25 92.39 106.79 15.6%
        log <all> 1000000 rand string 50 91.30 104.02 13.9%
        log <all> 1000000 rand string 500 86.16 63.27 -26.6%
        log <all> 1000000 rand string 1000 76.92 64.85 -15.7%
        log <all> 1000000 country 10 92.42 108.78 17.7%
        log <all> 1000000 country 25 92.60 106.26 14.8%
        log <all> 1000000 country 50 92.64 103.76 12.0%
        log <all> 1000000 country 500 83.92 50.30 -40.1%
        log <all> 1000000 country 1000 74.78 46.59 -37.7%
        log <all> 1000000 rand int 10 114.03 114.85 0.7%
        log <all> 1000000 rand int 25 113.77 112.92 -0.7%
        log <all> 1000000 rand int 50 113.36 109.56 -3.4%
        log <all> 1000000 rand int 500 103.90 66.29 -36.2%
        log <all> 1000000 rand int 1000 89.52 70.67 -21.1%
        Show
        Mark Miller added a comment - Results from John Wang: Seg size Query Tot hits Sort Top N QPS old QPS new Pct change log <all> 1000000 rand string 10 91.76 108.63 18.4% log <all> 1000000 rand string 25 92.39 106.79 15.6% log <all> 1000000 rand string 50 91.30 104.02 13.9% log <all> 1000000 rand string 500 86.16 63.27 -26.6% log <all> 1000000 rand string 1000 76.92 64.85 -15.7% log <all> 1000000 country 10 92.42 108.78 17.7% log <all> 1000000 country 25 92.60 106.26 14.8% log <all> 1000000 country 50 92.64 103.76 12.0% log <all> 1000000 country 500 83.92 50.30 -40.1% log <all> 1000000 country 1000 74.78 46.59 -37.7% log <all> 1000000 rand int 10 114.03 114.85 0.7% log <all> 1000000 rand int 25 113.77 112.92 -0.7% log <all> 1000000 rand int 50 113.36 109.56 -3.4% log <all> 1000000 rand int 500 103.90 66.29 -36.2% log <all> 1000000 rand int 1000 89.52 70.67 -21.1%
        Hide
        Jake Mannix added a comment -

        Hah! Thanks for posting that, Mark! Much easier to read.

        Hey John, can you comment with your hardware specs on this, so it can be recorded for posterity?

        Show
        Jake Mannix added a comment - Hah! Thanks for posting that, Mark! Much easier to read. Hey John, can you comment with your hardware specs on this, so it can be recorded for posterity?
        Hide
        John Wang added a comment -

        My machine HW spec:

        Model Name: MacBook Pro
        Model Identifier: MacBookPro3,1
        Processor Name: Intel Core 2 Duo
        Processor Speed: 2.4 GHz
        Number Of Processors: 1
        Total Number Of Cores: 2
        L2 Cache: 4 MB
        Memory: 4 GB
        Bus Speed: 800 MHz

        Show
        John Wang added a comment - My machine HW spec: Model Name: MacBook Pro Model Identifier: MacBookPro3,1 Processor Name: Intel Core 2 Duo Processor Speed: 2.4 GHz Number Of Processors: 1 Total Number Of Cores: 2 L2 Cache: 4 MB Memory: 4 GB Bus Speed: 800 MHz
        Hide
        Mark Miller added a comment - - edited

        Another run:

        I made the changes to int/string comparator to do the faster compare.
        Java 1.5.0_20
        Laptop - 64bit OS - 64bit JVM - 64bit
        Quad Core - 2.0 Ghz
        Ubuntu 9.10 Kernel 2.6.31
        4 GB RAM

        Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        log 1 317925 title 10 87.38 75.42 -13.7%
        log 1 317925 title 25 86.55 74.49 -13.9%
        log 1 317925 title 50 90.49 71.90 -20.5%
        log 1 317925 title 100 88.07 83.08 -5.7%
        log 1 317925 title 500 76.67 54.34 -29.1%
        log 1 317925 title 1000 69.29 38.54 -44.4%
        log <all> 1000000 title 10 109.01 92.78 -14.9%
        log <all> 1000000 title 25 108.30 89.43 -17.4%
        log <all> 1000000 title 50 107.19 85.86 -19.9%
        log <all> 1000000 title 100 94.84 80.25 -15.4%
        log <all> 1000000 title 500 78.84 49.10 -37.7%
        log <all> 1000000 title 1000 72.52 26.90 -62.9%
        log <all> 1000000 rand string 10 115.32 101.53 -12.0%
        log <all> 1000000 rand string 25 115.22 91.82 -20.3%
        log <all> 1000000 rand string 50 114.40 89.70 -21.6%
        log <all> 1000000 rand string 100 91.30 81.04 -11.2%
        log <all> 1000000 rand string 500 76.31 43.94 -42.4%
        log <all> 1000000 rand string 1000 67.33 28.29 -58.0%
        log <all> 1000000 country 10 115.40 101.46 -12.1%
        log <all> 1000000 country 25 115.06 92.15 -19.9%
        log <all> 1000000 country 50 114.03 90.06 -21.0%
        log <all> 1000000 country 100 99.30 80.07 -19.4%
        log <all> 1000000 country 500 75.64 43.44 -42.6%
        log <all> 1000000 country 1000 66.05 27.94 -57.7%
        log <all> 1000000 rand int 10 118.47 109.30 -7.7%
        log <all> 1000000 rand int 25 118.72 99.37 -16.3%
        log <all> 1000000 rand int 50 118.25 95.14 -19.5%
        log <all> 1000000 rand int 100 97.57 83.39 -14.5%
        log <all> 1000000 rand int 500 86.55 46.21 -46.6%
        log <all> 1000000 rand int 1000 78.23 28.94 -63.0%
        Show
        Mark Miller added a comment - - edited Another run: I made the changes to int/string comparator to do the faster compare. Java 1.5.0_20 Laptop - 64bit OS - 64bit JVM - 64bit Quad Core - 2.0 Ghz Ubuntu 9.10 Kernel 2.6.31 4 GB RAM Seg size Query Tot hits Sort Top N QPS old QPS new Pct change log 1 317925 title 10 87.38 75.42 -13.7% log 1 317925 title 25 86.55 74.49 -13.9% log 1 317925 title 50 90.49 71.90 -20.5% log 1 317925 title 100 88.07 83.08 -5.7% log 1 317925 title 500 76.67 54.34 -29.1% log 1 317925 title 1000 69.29 38.54 -44.4% log <all> 1000000 title 10 109.01 92.78 -14.9% log <all> 1000000 title 25 108.30 89.43 -17.4% log <all> 1000000 title 50 107.19 85.86 -19.9% log <all> 1000000 title 100 94.84 80.25 -15.4% log <all> 1000000 title 500 78.84 49.10 -37.7% log <all> 1000000 title 1000 72.52 26.90 -62.9% log <all> 1000000 rand string 10 115.32 101.53 -12.0% log <all> 1000000 rand string 25 115.22 91.82 -20.3% log <all> 1000000 rand string 50 114.40 89.70 -21.6% log <all> 1000000 rand string 100 91.30 81.04 -11.2% log <all> 1000000 rand string 500 76.31 43.94 -42.4% log <all> 1000000 rand string 1000 67.33 28.29 -58.0% log <all> 1000000 country 10 115.40 101.46 -12.1% log <all> 1000000 country 25 115.06 92.15 -19.9% log <all> 1000000 country 50 114.03 90.06 -21.0% log <all> 1000000 country 100 99.30 80.07 -19.4% log <all> 1000000 country 500 75.64 43.44 -42.6% log <all> 1000000 country 1000 66.05 27.94 -57.7% log <all> 1000000 rand int 10 118.47 109.30 -7.7% log <all> 1000000 rand int 25 118.72 99.37 -16.3% log <all> 1000000 rand int 50 118.25 95.14 -19.5% log <all> 1000000 rand int 100 97.57 83.39 -14.5% log <all> 1000000 rand int 500 86.55 46.21 -46.6% log <all> 1000000 rand int 1000 78.23 28.94 -63.0%
        Hide
        Yonik Seeley added a comment -

        While Java5 numbers are still important, I'd say that Java6 (-server of course) should be weighted far heavier? That must be what a majority of people are running in production for new systems?

        Show
        Yonik Seeley added a comment - While Java5 numbers are still important, I'd say that Java6 (-server of course) should be weighted far heavier? That must be what a majority of people are running in production for new systems?
        Hide
        Mark Miller added a comment - - edited

        Hey John, did you pull from a wiki dump or use the random index?

        edit

        NM - that explains your shortened table - no wiki results - I go it.

        Show
        Mark Miller added a comment - - edited Hey John, did you pull from a wiki dump or use the random index? edit NM - that explains your shortened table - no wiki results - I go it.
        Hide
        Jake Mannix added a comment -

        Java6 is standard in production servers, since when? What justified lucene staying java1.4 for so long if this is the case? In my own experience, my last job only moved to java1.5 a year ago, and at my current company, we're still on 1.5, and I've seen that be pretty common, and I'm in the Valley, where things update pretty quickly.

        Show
        Jake Mannix added a comment - Java6 is standard in production servers, since when? What justified lucene staying java1.4 for so long if this is the case? In my own experience, my last job only moved to java1.5 a year ago, and at my current company, we're still on 1.5, and I've seen that be pretty common, and I'm in the Valley, where things update pretty quickly.
        Hide
        Jake Mannix added a comment -

        I would say that of course weighting more highly linux and solaris should be done over results on macs, because while I love my mac, I've yet to see a production cluster running on MacBook Pros...

        Show
        Jake Mannix added a comment - I would say that of course weighting more highly linux and solaris should be done over results on macs, because while I love my mac, I've yet to see a production cluster running on MacBook Pros...
        Hide
        Yonik Seeley added a comment -

        Java6 is standard in production servers, since when?

        Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on.

        What justified lucene staying java1.4 for so long if this is the case?

        The decision of what JVM a business should use to deploy their new app is a very different one than what Lucene should require.
        A minority of users may be justification enough to avoid requring a new JVM... unless the benefits are really that huge. Lucene does not target the JVM that most people will be deploying on - if that were the case, I have a feeling we'd be switching to Java6 instead of Java5.

        Show
        Yonik Seeley added a comment - Java6 is standard in production servers, since when? Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on. What justified lucene staying java1.4 for so long if this is the case? The decision of what JVM a business should use to deploy their new app is a very different one than what Lucene should require. A minority of users may be justification enough to avoid requring a new JVM... unless the benefits are really that huge. Lucene does not target the JVM that most people will be deploying on - if that were the case, I have a feeling we'd be switching to Java6 instead of Java5.
        Hide
        Mark Miller added a comment -

        Same system, Java 1.6.0_15

        Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        log 1 317925 title 10 105.46 97.11 -7.9%
        log 1 317925 title 25 109.08 98.34 -9.8%
        log 1 317925 title 50 108.01 93.99 -13.0%
        log 1 317925 title 100 105.79 84.08 -20.5%
        log 1 317925 title 500 91.12 50.28 -44.8%
        log 1 317925 title 1000 80.51 33.59 -58.3%
        log <all> 1000000 title 10 113.89 105.39 -7.5%
        log <all> 1000000 title 25 113.14 102.13 -9.7%
        log <all> 1000000 title 50 111.30 96.51 -13.3%
        log <all> 1000000 title 100 86.77 83.86 -3.4%
        log <all> 1000000 title 500 78.00 42.15 -46.0%
        log <all> 1000000 title 1000 70.50 27.02 -61.7%
        log <all> 1000000 rand string 10 107.78 106.09 -1.6%
        log <all> 1000000 rand string 25 103.09 102.53 -0.5%
        log <all> 1000000 rand string 50 106.42 95.17 -10.6%
        log <all> 1000000 rand string 100 86.28 85.41 -1.0%
        log <all> 1000000 rand string 500 76.69 37.76 -50.8%
        log <all> 1000000 rand string 1000 68.48 22.95 -66.5%
        log <all> 1000000 country 10 103.36 106.79 3.3%
        log <all> 1000000 country 25 103.43 102.69 -0.7%
        log <all> 1000000 country 50 102.93 94.97 -7.7%
        log <all> 1000000 country 100 108.49 85.71 -21.0%
        log <all> 1000000 country 500 80.87 38.23 -52.7%
        log <all> 1000000 country 1000 67.24 22.79 -66.1%
        log <all> 1000000 rand int 10 120.59 112.03 -7.1%
        log <all> 1000000 rand int 25 119.80 107.49 -10.3%
        log <all> 1000000 rand int 50 119.96 98.84 -17.6%
        log <all> 1000000 rand int 100 88.58 89.24 0.7%
        log <all> 1000000 rand int 500 83.50 40.13 -51.9%
        log <all> 1000000 rand int 1000 74.80 23.83 -68.1%
        Show
        Mark Miller added a comment - Same system, Java 1.6.0_15 Seg size Query Tot hits Sort Top N QPS old QPS new Pct change log 1 317925 title 10 105.46 97.11 -7.9% log 1 317925 title 25 109.08 98.34 -9.8% log 1 317925 title 50 108.01 93.99 -13.0% log 1 317925 title 100 105.79 84.08 -20.5% log 1 317925 title 500 91.12 50.28 -44.8% log 1 317925 title 1000 80.51 33.59 -58.3% log <all> 1000000 title 10 113.89 105.39 -7.5% log <all> 1000000 title 25 113.14 102.13 -9.7% log <all> 1000000 title 50 111.30 96.51 -13.3% log <all> 1000000 title 100 86.77 83.86 -3.4% log <all> 1000000 title 500 78.00 42.15 -46.0% log <all> 1000000 title 1000 70.50 27.02 -61.7% log <all> 1000000 rand string 10 107.78 106.09 -1.6% log <all> 1000000 rand string 25 103.09 102.53 -0.5% log <all> 1000000 rand string 50 106.42 95.17 -10.6% log <all> 1000000 rand string 100 86.28 85.41 -1.0% log <all> 1000000 rand string 500 76.69 37.76 -50.8% log <all> 1000000 rand string 1000 68.48 22.95 -66.5% log <all> 1000000 country 10 103.36 106.79 3.3% log <all> 1000000 country 25 103.43 102.69 -0.7% log <all> 1000000 country 50 102.93 94.97 -7.7% log <all> 1000000 country 100 108.49 85.71 -21.0% log <all> 1000000 country 500 80.87 38.23 -52.7% log <all> 1000000 country 1000 67.24 22.79 -66.1% log <all> 1000000 rand int 10 120.59 112.03 -7.1% log <all> 1000000 rand int 25 119.80 107.49 -10.3% log <all> 1000000 rand int 50 119.96 98.84 -17.6% log <all> 1000000 rand int 100 88.58 89.24 0.7% log <all> 1000000 rand int 500 83.50 40.13 -51.9% log <all> 1000000 rand int 1000 74.80 23.83 -68.1%
        Hide
        Mark Miller added a comment -

        Java6 is standard in production servers, since when?

        Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on.

        Thats my impression too - Java 1.6 is mainly just a bug fix and performance release and has been out for a while, so its usually the choice I've seen.
        Sounds like Uwe thinks its more buggy though, so who knows if thats a good idea

        Show
        Mark Miller added a comment - Java6 is standard in production servers, since when? Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on. Thats my impression too - Java 1.6 is mainly just a bug fix and performance release and has been out for a while, so its usually the choice I've seen. Sounds like Uwe thinks its more buggy though, so who knows if thats a good idea
        Hide
        Mark Miller added a comment -

        John, what happened to your topn:100 results?

        Show
        Mark Miller added a comment - John, what happened to your topn:100 results?
        Hide
        Yonik Seeley added a comment -

        There was a bad stretch in Java6... they plopped in a major JVM upgrade (not just bug fixes) and there were bugs. I think that's been behind us for a little while now though. If someone were starting a project today, I'd recommend the latest Java6 JVM.

        Show
        Yonik Seeley added a comment - There was a bad stretch in Java6... they plopped in a major JVM upgrade (not just bug fixes) and there were bugs. I think that's been behind us for a little while now though. If someone were starting a project today, I'd recommend the latest Java6 JVM.
        Hide
        John Wang added a comment -

        bq: topn:100
        I had made changes to sortBench.py to look at each run. And forgot to add back in 100 My bad.

        Show
        John Wang added a comment - bq: topn:100 I had made changes to sortBench.py to look at each run. And forgot to add back in 100 My bad.
        Hide
        Mark Miller added a comment -

        Anyone got a Windows box to run this on? I'm only running windows on a VM these days.

        Show
        Mark Miller added a comment - Anyone got a Windows box to run this on? I'm only running windows on a VM these days.
        Hide
        Mark Miller added a comment -

        There was a bad stretch in Java6...

        But how can that be ? Number 10 of the top 10 of whats new is the -lites!

        10. The -lities: Quality, Compatibility, Stability

        Show
        Mark Miller added a comment - There was a bad stretch in Java6... But how can that be ? Number 10 of the top 10 of whats new is the -lites! 10. The -lities: Quality, Compatibility, Stability
        Hide
        Uwe Schindler added a comment -

        Thats my impression too - Java 1.6 is mainly just a bug fix and performance release and has been out for a while, so its usually the choice I've seen. Sounds like Uwe thinks its more buggy though, so who knows if thats a good idea

        Because of this, for Lucene 3.0 we should say, it's a Java 1.5 compatible release. As Mark said, Java 6 does not contain anything really new useable for Lucene, so we are fine with staying on 1.5. If somebody wants to use 1.5 or 1.6 it's his choice, but we should not force people to use 1.6. If at least one developer uses 1.5 for developing, we have no problem with maybe some added functions in core classes we accidently use (like String.isEmpty() - which is a common problem because it was added in 1.6 and many developers use it intuitive).

        Even though 1.5 is EOLed by Sun, they recently added a new release 1.5.0_21. I was also wondering about that, but it seems that Sun is still providing "support" for it.

        About the stability: maybe it is better now, but I have seen so many crashed JVMs in the earlier versions <= _12, so I stayed on 1.5. But we are also thinking of switching here at some time.

        Show
        Uwe Schindler added a comment - Thats my impression too - Java 1.6 is mainly just a bug fix and performance release and has been out for a while, so its usually the choice I've seen. Sounds like Uwe thinks its more buggy though, so who knows if thats a good idea Because of this, for Lucene 3.0 we should say, it's a Java 1.5 compatible release. As Mark said, Java 6 does not contain anything really new useable for Lucene, so we are fine with staying on 1.5. If somebody wants to use 1.5 or 1.6 it's his choice, but we should not force people to use 1.6. If at least one developer uses 1.5 for developing, we have no problem with maybe some added functions in core classes we accidently use (like String.isEmpty() - which is a common problem because it was added in 1.6 and many developers use it intuitive). Even though 1.5 is EOLed by Sun, they recently added a new release 1.5.0_21. I was also wondering about that, but it seems that Sun is still providing "support" for it. About the stability: maybe it is better now, but I have seen so many crashed JVMs in the earlier versions <= _12, so I stayed on 1.5. But we are also thinking of switching here at some time.
        Hide
        John Wang added a comment -

        I think I found the reason for the discrepancy: 32 vs 64 bit:

        32-bit, run
        jwang-mn:benchmark jwang$ python -u sortBench.py -report john3

        Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        log <all> 1000000 rand string 10 92.24 103.65 12.4%
        log <all> 1000000 rand string 25 91.88 102.06 11.1%
        log <all> 1000000 rand string 50 91.72 99.07 8.0%
        log <all> 1000000 rand string 100 106.26 90.61 -14.7%
        log <all> 1000000 rand string 500 86.38 59.88 -30.7%
        log <all> 1000000 rand string 1000 74.88 39.93 -46.7%
        log <all> 1000000 country 10 92.33 103.79 12.4%
        log <all> 1000000 country 25 92.27 101.60 10.1%
        log <all> 1000000 country 50 91.58 99.14 8.3%
        log <all> 1000000 country 100 100.76 82.25 -18.4%
        log <all> 1000000 country 500 75.18 48.65 -35.3%
        log <all> 1000000 country 1000 67.68 32.67 -51.7%
        log <all> 1000000 rand int 10 88.14 101.93 15.6%
        log <all> 1000000 rand int 25 95.02 96.14 1.2%
        log <all> 1000000 rand int 50 96.54 89.61 -7.2%
        log <all> 1000000 rand int 100 88.58 92.06 3.9%
        log <all> 1000000 rand int 500 103.60 62.25 -39.9%
        log <all> 1000000 rand int 1000 92.36 40.84 -55.8%

        64bit run:
        jwang-mn:benchmark jwang$ python -u sortBench.py -report john4

        Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        log <all> 1000000 rand string 10 119.59 107.52 -10.1%
        log <all> 1000000 rand string 25 119.25 105.05 -11.9%
        log <all> 1000000 rand string 50 117.22 101.99 -13.0%
        log <all> 1000000 rand string 100 95.78 86.19 -10.0%
        log <all> 1000000 rand string 500 76.05 54.71 -28.1%
        log <all> 1000000 rand string 1000 68.37 38.94 -43.0%
        log <all> 1000000 country 10 119.68 108.12 -9.7%
        log <all> 1000000 country 25 119.10 105.72 -11.2%
        log <all> 1000000 country 50 115.85 99.70 -13.9%
        log <all> 1000000 country 100 97.44 91.03 -6.6%
        log <all> 1000000 country 500 78.92 40.97 -48.1%
        log <all> 1000000 country 1000 68.48 30.43 -55.6%
        log <all> 1000000 rand int 10 121.64 108.82 -10.5%
        log <all> 1000000 rand int 25 121.68 113.92 -6.4%
        log <all> 1000000 rand int 50 120.80 110.45 -8.6%
        log <all> 1000000 rand int 100 101.36 95.68 -5.6%
        log <all> 1000000 rand int 500 90.15 60.29 -33.1%
        log <all> 1000000 rand int 1000 80.23 40.67 -49.3%
        Show
        John Wang added a comment - I think I found the reason for the discrepancy: 32 vs 64 bit: 32-bit, run jwang-mn:benchmark jwang$ python -u sortBench.py -report john3 Seg size Query Tot hits Sort Top N QPS old QPS new Pct change log <all> 1000000 rand string 10 92.24 103.65 12.4% log <all> 1000000 rand string 25 91.88 102.06 11.1% log <all> 1000000 rand string 50 91.72 99.07 8.0% log <all> 1000000 rand string 100 106.26 90.61 -14.7% log <all> 1000000 rand string 500 86.38 59.88 -30.7% log <all> 1000000 rand string 1000 74.88 39.93 -46.7% log <all> 1000000 country 10 92.33 103.79 12.4% log <all> 1000000 country 25 92.27 101.60 10.1% log <all> 1000000 country 50 91.58 99.14 8.3% log <all> 1000000 country 100 100.76 82.25 -18.4% log <all> 1000000 country 500 75.18 48.65 -35.3% log <all> 1000000 country 1000 67.68 32.67 -51.7% log <all> 1000000 rand int 10 88.14 101.93 15.6% log <all> 1000000 rand int 25 95.02 96.14 1.2% log <all> 1000000 rand int 50 96.54 89.61 -7.2% log <all> 1000000 rand int 100 88.58 92.06 3.9% log <all> 1000000 rand int 500 103.60 62.25 -39.9% log <all> 1000000 rand int 1000 92.36 40.84 -55.8% 64bit run: jwang-mn:benchmark jwang$ python -u sortBench.py -report john4 Seg size Query Tot hits Sort Top N QPS old QPS new Pct change log <all> 1000000 rand string 10 119.59 107.52 -10.1% log <all> 1000000 rand string 25 119.25 105.05 -11.9% log <all> 1000000 rand string 50 117.22 101.99 -13.0% log <all> 1000000 rand string 100 95.78 86.19 -10.0% log <all> 1000000 rand string 500 76.05 54.71 -28.1% log <all> 1000000 rand string 1000 68.37 38.94 -43.0% log <all> 1000000 country 10 119.68 108.12 -9.7% log <all> 1000000 country 25 119.10 105.72 -11.2% log <all> 1000000 country 50 115.85 99.70 -13.9% log <all> 1000000 country 100 97.44 91.03 -6.6% log <all> 1000000 country 500 78.92 40.97 -48.1% log <all> 1000000 country 1000 68.48 30.43 -55.6% log <all> 1000000 rand int 10 121.64 108.82 -10.5% log <all> 1000000 rand int 25 121.68 113.92 -6.4% log <all> 1000000 rand int 50 120.80 110.45 -8.6% log <all> 1000000 rand int 100 101.36 95.68 -5.6% log <all> 1000000 rand int 500 90.15 60.29 -33.1% log <all> 1000000 rand int 1000 80.23 40.67 -49.3%
        Hide
        John Wang added a comment -

        wrote a small test and verified that 64bit vm's string compare is much faster than that of 32-bit. (kinda makes sense)
        and the above numbers now all make sense.

        Show
        John Wang added a comment - wrote a small test and verified that 64bit vm's string compare is much faster than that of 32-bit. (kinda makes sense) and the above numbers now all make sense.
        Hide
        Uwe Schindler added a comment - - edited

        So it does not have something to do with Java 1.5/1.6 but more with 32/64 bit. As most servers are running 64 bit, I think the new 2.9 search API is fine?

        I agree with you, the new API is cleaner at all, the old API could only be reimplemented with major refactorings, as it does not fit well in multi-segment search.

        By the way, I found during refactoring for Java5 some inconsistenceies in MultiSearcher/ParallelMultiSearcher, which uses FieldDocSortedHitQueue (its used nowhere else anymore): During sorting it uses (when merging the queues of all Searchers) some native compareTo operations, which may not work correctly with custom comparators. Is this correct? In my opinion this queue should also somehow use at least the FieldComparator's compare functions.
        Mark, do not understand it completely, but how does this fit together. I added a warning because of very strange casts in the source code (unsafe casts) and a SuppressWarnings("unchecked") so its easy to find in FieldDocSortedHitQueue. The temp variable is just there for the unchecked warning supress (but it really needs to be fixed).

        Show
        Uwe Schindler added a comment - - edited So it does not have something to do with Java 1.5/1.6 but more with 32/64 bit. As most servers are running 64 bit, I think the new 2.9 search API is fine? I agree with you, the new API is cleaner at all, the old API could only be reimplemented with major refactorings, as it does not fit well in multi-segment search. By the way, I found during refactoring for Java5 some inconsistenceies in MultiSearcher/ParallelMultiSearcher, which uses FieldDocSortedHitQueue (its used nowhere else anymore): During sorting it uses (when merging the queues of all Searchers) some native compareTo operations, which may not work correctly with custom comparators. Is this correct? In my opinion this queue should also somehow use at least the FieldComparator's compare functions. Mark, do not understand it completely, but how does this fit together. I added a warning because of very strange casts in the source code (unsafe casts) and a SuppressWarnings("unchecked") so its easy to find in FieldDocSortedHitQueue. The temp variable is just there for the unchecked warning supress (but it really needs to be fixed).
        Hide
        Mark Miller added a comment - - edited

        but how does this fit together.

        Thats what Comparable FieldComparator#value is for - fillFields will grab all those and load up FieldDoc fields - so the custom FieldComparator is tied into it - it creates Comparable objects that can be compared by the native compareTos. (the old API did the same thing)

          /**
           * Given a queue Entry, creates a corresponding FieldDoc
           * that contains the values used to sort the given document.
           * These values are not the raw values out of the index, but the internal
           * representation of them. This is so the given search hit can be collated by
           * a MultiSearcher with other search hits.
           * 
           * @param entry The Entry used to create a FieldDoc
           * @return The newly created FieldDoc
           * @see Searchable#search(Weight,Filter,int,Sort)
           */
          FieldDoc fillFields(final Entry entry) {
            final int n = comparators.length;
            final Comparable[] fields = new Comparable[n];
            for (int i = 0; i < n; ++i) {
              fields[i] = comparators[i].value(entry.slot);
            }
            //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
            return new FieldDoc(entry.docID, entry.score, fields);
          }
        
        Show
        Mark Miller added a comment - - edited but how does this fit together. Thats what Comparable FieldComparator#value is for - fillFields will grab all those and load up FieldDoc fields - so the custom FieldComparator is tied into it - it creates Comparable objects that can be compared by the native compareTos. (the old API did the same thing) /** * Given a queue Entry, creates a corresponding FieldDoc * that contains the values used to sort the given document. * These values are not the raw values out of the index, but the internal * representation of them. This is so the given search hit can be collated by * a MultiSearcher with other search hits. * * @param entry The Entry used to create a FieldDoc * @ return The newly created FieldDoc * @see Searchable#search(Weight,Filter, int ,Sort) */ FieldDoc fillFields( final Entry entry) { final int n = comparators.length; final Comparable[] fields = new Comparable[n]; for ( int i = 0; i < n; ++i) { fields[i] = comparators[i].value(entry.slot); } // if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores return new FieldDoc(entry.docID, entry.score, fields); }
        Hide
        Mark Miller added a comment -

        As most servers are running 64 bit,

        Aren't we at the tipping point where even non servers are 64bit now? My consumer desktop/laptops have been 64-bit for years now.

        Show
        Mark Miller added a comment - As most servers are running 64 bit, Aren't we at the tipping point where even non servers are 64bit now? My consumer desktop/laptops have been 64-bit for years now.
        Hide
        Uwe Schindler added a comment -

        it creates Comparable objects that can be compared by the native compareTos. (the old API did the same thing)

        OK understood. I will try to fix the generics somehow to be able to remove the SuppressWarnings.

        Show
        Uwe Schindler added a comment - it creates Comparable objects that can be compared by the native compareTos. (the old API did the same thing) OK understood. I will try to fix the generics somehow to be able to remove the SuppressWarnings.
        Hide
        Michael McCandless added a comment -

        New patch attached:

        • Made some basic code level optimizations, eg created an explicit
          DocIDPriorityQueue (that deals in int not Object, to avoid
          casting), subclassed that directly to a SortByStringQueue and a
          SortByIntQueue. It turns out that if statement (when comparing
          int values) must stay because the subtraction can overflow int.
        • Added "sortBench.py -verify" that quickly runs each API across all
          tests and confirms results are identical – proxy for real unit
          tests
        • Added "Source" (wiki or random) to Jira table output
        • Print java/os version at start

        I'll re-run my test.

        Show
        Michael McCandless added a comment - New patch attached: Made some basic code level optimizations, eg created an explicit DocIDPriorityQueue (that deals in int not Object, to avoid casting), subclassed that directly to a SortByStringQueue and a SortByIntQueue. It turns out that if statement (when comparing int values) must stay because the subtraction can overflow int. Added "sortBench.py -verify" that quickly runs each API across all tests and confirms results are identical – proxy for real unit tests Added "Source" (wiki or random) to Jira table output Print java/os version at start I'll re-run my test.
        Hide
        Michael McCandless added a comment -

        New patch, fixes silly bug in sortBench.py.

        Show
        Michael McCandless added a comment - New patch, fixes silly bug in sortBench.py.
        Hide
        Michael McCandless added a comment -

        Env:

        JAVA:
        java version "1.5.0_19"
        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02)
        Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-b02, mixed mode)

        OS:
        SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

        Results:

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        wiki log 1 318481 title 10 98.47 104.60 6.2%
        wiki log 1 318481 title 25 97.90 103.63 5.9%
        wiki log 1 318481 title 50 105.12 101.50 -3.4%
        wiki log 1 318481 title 100 102.30 108.59 6.1%
        wiki log 1 318481 title 500 89.43 79.40 -11.2%
        wiki log 1 318481 title 1000 82.83 63.75 -23.0%
        wiki log <all> 1000000 title 10 152.56 157.40 3.2%
        wiki log <all> 1000000 title 25 151.95 148.52 -2.3%
        wiki log <all> 1000000 title 50 148.52 142.90 -3.8%
        wiki log <all> 1000000 title 100 127.70 138.72 8.6%
        wiki log <all> 1000000 title 500 104.30 90.30 -13.4%
        wiki log <all> 1000000 title 1000 99.10 66.05 -33.4%
        random log <all> 1000000 rand string 10 153.13 157.74 3.0%
        random log <all> 1000000 rand string 25 128.79 150.62 17.0%
        random log <all> 1000000 rand string 50 122.46 153.95 25.7%
        random log <all> 1000000 rand string 100 116.26 141.43 21.6%
        random log <all> 1000000 rand string 500 98.24 96.17 -2.1%
        random log <all> 1000000 rand string 1000 86.38 71.95 -16.7%
        random log <all> 1000000 country 10 148.65 153.23 3.1%
        random log <all> 1000000 country 25 148.52 152.69 2.8%
        random log <all> 1000000 country 50 122.01 149.52 22.5%
        random log <all> 1000000 country 100 120.39 145.99 21.3%
        random log <all> 1000000 country 500 99.70 95.65 -4.1%
        random log <all> 1000000 country 1000 90.18 69.46 -23.0%
        random log <all> 1000000 rand int 10 150.85 171.22 13.5%
        random log <all> 1000000 rand int 25 151.13 167.94 11.1%
        random log <all> 1000000 rand int 50 152.51 162.23 6.4%
        random log <all> 1000000 rand int 100 130.54 145.04 11.1%
        random log <all> 1000000 rand int 500 108.38 43.74 -59.6%
        random log <all> 1000000 rand int 1000 98.27 63.56 -35.3%
        Show
        Michael McCandless added a comment - Env: JAVA: java version "1.5.0_19" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-b02, mixed mode) OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris Results: Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change wiki log 1 318481 title 10 98.47 104.60 6.2% wiki log 1 318481 title 25 97.90 103.63 5.9% wiki log 1 318481 title 50 105.12 101.50 -3.4% wiki log 1 318481 title 100 102.30 108.59 6.1% wiki log 1 318481 title 500 89.43 79.40 -11.2% wiki log 1 318481 title 1000 82.83 63.75 -23.0% wiki log <all> 1000000 title 10 152.56 157.40 3.2% wiki log <all> 1000000 title 25 151.95 148.52 -2.3% wiki log <all> 1000000 title 50 148.52 142.90 -3.8% wiki log <all> 1000000 title 100 127.70 138.72 8.6% wiki log <all> 1000000 title 500 104.30 90.30 -13.4% wiki log <all> 1000000 title 1000 99.10 66.05 -33.4% random log <all> 1000000 rand string 10 153.13 157.74 3.0% random log <all> 1000000 rand string 25 128.79 150.62 17.0% random log <all> 1000000 rand string 50 122.46 153.95 25.7% random log <all> 1000000 rand string 100 116.26 141.43 21.6% random log <all> 1000000 rand string 500 98.24 96.17 -2.1% random log <all> 1000000 rand string 1000 86.38 71.95 -16.7% random log <all> 1000000 country 10 148.65 153.23 3.1% random log <all> 1000000 country 25 148.52 152.69 2.8% random log <all> 1000000 country 50 122.01 149.52 22.5% random log <all> 1000000 country 100 120.39 145.99 21.3% random log <all> 1000000 country 500 99.70 95.65 -4.1% random log <all> 1000000 country 1000 90.18 69.46 -23.0% random log <all> 1000000 rand int 10 150.85 171.22 13.5% random log <all> 1000000 rand int 25 151.13 167.94 11.1% random log <all> 1000000 rand int 50 152.51 162.23 6.4% random log <all> 1000000 rand int 100 130.54 145.04 11.1% random log <all> 1000000 rand int 500 108.38 43.74 -59.6% random log <all> 1000000 rand int 1000 98.27 63.56 -35.3%
        Hide
        Michael McCandless added a comment -

        32 bit 1.5 JRE:

        JAVA:
        java version "1.5.0_19"
        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02)
        Java HotSpot(TM) Server VM (build 1.5.0_19-b02, mixed mode)

        OS:
        SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        wiki log 1 318481 title 10 97.31 92.69 -4.7%
        wiki log 1 318481 title 25 96.74 92.09 -4.8%
        wiki log 1 318481 title 50 98.57 90.03 -8.7%
        wiki log 1 318481 title 100 97.20 103.72 6.7%
        wiki log 1 318481 title 500 84.14 78.23 -7.0%
        wiki log 1 318481 title 1000 77.84 63.62 -18.3%
        wiki log <all> 1000000 title 10 114.99 136.86 19.0%
        wiki log <all> 1000000 title 25 114.63 125.92 9.8%
        wiki log <all> 1000000 title 50 113.33 130.58 15.2%
        wiki log <all> 1000000 title 100 115.36 111.81 -3.1%
        wiki log <all> 1000000 title 500 107.30 86.16 -19.7%
        wiki log <all> 1000000 title 1000 98.07 55.39 -43.5%
        random log <all> 1000000 rand string 10 115.55 140.86 21.9%
        random log <all> 1000000 rand string 25 125.66 137.15 9.1%
        random log <all> 1000000 rand string 50 123.58 133.82 8.3%
        random log <all> 1000000 rand string 100 115.51 134.82 16.7%
        random log <all> 1000000 rand string 500 102.73 93.24 -9.2%
        random log <all> 1000000 rand string 1000 88.70 65.09 -26.6%
        random log <all> 1000000 country 10 113.92 139.72 22.6%
        random log <all> 1000000 country 25 113.44 131.36 15.8%
        random log <all> 1000000 country 50 122.88 128.62 4.7%
        random log <all> 1000000 country 100 121.88 135.58 11.2%
        random log <all> 1000000 country 500 96.94 79.38 -18.1%
        random log <all> 1000000 country 1000 82.01 62.31 -24.0%
        random log <all> 1000000 rand int 10 124.58 134.20 7.7%
        random log <all> 1000000 rand int 25 123.46 134.82 9.2%
        random log <all> 1000000 rand int 50 117.96 128.61 9.0%
        random log <all> 1000000 rand int 100 113.92 122.09 7.2%
        random log <all> 1000000 rand int 500 105.49 38.92 -63.1%
        random log <all> 1000000 rand int 1000 92.27 53.14 -42.4%
        Show
        Michael McCandless added a comment - 32 bit 1.5 JRE: JAVA: java version "1.5.0_19" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02) Java HotSpot(TM) Server VM (build 1.5.0_19-b02, mixed mode) OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change wiki log 1 318481 title 10 97.31 92.69 -4.7% wiki log 1 318481 title 25 96.74 92.09 -4.8% wiki log 1 318481 title 50 98.57 90.03 -8.7% wiki log 1 318481 title 100 97.20 103.72 6.7% wiki log 1 318481 title 500 84.14 78.23 -7.0% wiki log 1 318481 title 1000 77.84 63.62 -18.3% wiki log <all> 1000000 title 10 114.99 136.86 19.0% wiki log <all> 1000000 title 25 114.63 125.92 9.8% wiki log <all> 1000000 title 50 113.33 130.58 15.2% wiki log <all> 1000000 title 100 115.36 111.81 -3.1% wiki log <all> 1000000 title 500 107.30 86.16 -19.7% wiki log <all> 1000000 title 1000 98.07 55.39 -43.5% random log <all> 1000000 rand string 10 115.55 140.86 21.9% random log <all> 1000000 rand string 25 125.66 137.15 9.1% random log <all> 1000000 rand string 50 123.58 133.82 8.3% random log <all> 1000000 rand string 100 115.51 134.82 16.7% random log <all> 1000000 rand string 500 102.73 93.24 -9.2% random log <all> 1000000 rand string 1000 88.70 65.09 -26.6% random log <all> 1000000 country 10 113.92 139.72 22.6% random log <all> 1000000 country 25 113.44 131.36 15.8% random log <all> 1000000 country 50 122.88 128.62 4.7% random log <all> 1000000 country 100 121.88 135.58 11.2% random log <all> 1000000 country 500 96.94 79.38 -18.1% random log <all> 1000000 country 1000 82.01 62.31 -24.0% random log <all> 1000000 rand int 10 124.58 134.20 7.7% random log <all> 1000000 rand int 25 123.46 134.82 9.2% random log <all> 1000000 rand int 50 117.96 128.61 9.0% random log <all> 1000000 rand int 100 113.92 122.09 7.2% random log <all> 1000000 rand int 500 105.49 38.92 -63.1% random log <all> 1000000 rand int 1000 92.27 53.14 -42.4%
        Hide
        Jake Mannix added a comment -

        Mike, thanks for all the hard work on this - it's clearly far more work than anyone has spent yet on just doing the upgrade to the newer api, and that's appreciated.

        Am I wrong in thinking that these results are pretty ambiguous? How often to people take the top 500 or top 1000 sorted hits? If you don't focus on that case (that of looking for pages 50 through 100 of normal 10-per-page search results), there's a bunch of green, a bunch of red, both techniques are +/- 10-20% of each other?

        Is that what everyone else sees of Mike's newest numbers here, or am I misreading them?

        Show
        Jake Mannix added a comment - Mike, thanks for all the hard work on this - it's clearly far more work than anyone has spent yet on just doing the upgrade to the newer api, and that's appreciated. Am I wrong in thinking that these results are pretty ambiguous? How often to people take the top 500 or top 1000 sorted hits? If you don't focus on that case (that of looking for pages 50 through 100 of normal 10-per-page search results), there's a bunch of green, a bunch of red, both techniques are +/- 10-20% of each other? Is that what everyone else sees of Mike's newest numbers here, or am I misreading them?
        Hide
        Mark Miller added a comment -

        Mike's latest results are more ambiguous - let me run the new stuff on Linux too.

        Show
        Mark Miller added a comment - Mike's latest results are more ambiguous - let me run the new stuff on Linux too.
        Hide
        Mark Miller added a comment -

        JAVA:
        java version "1.5.0_20"
        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02)
        Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_20-b02, mixed mode)

        OS:
        Linux quad-laptop 2.6.31-14-generic #48-Ubuntu SMP x86_64 GNU/Linux

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        wiki log 1 317914 title 10 80.09 78.20 -2.4%
        wiki log 1 317914 title 25 80.12 79.51 -0.8%
        wiki log 1 317914 title 50 78.61 76.03 -3.3%
        wiki log 1 317914 title 100 77.18 75.13 -2.7%
        wiki log 1 317914 title 500 75.01 54.74 -27.0%
        wiki log 1 317914 title 1000 67.77 41.29 -39.1%
        wiki log <all> 1000000 title 10 109.30 119.29 9.1%
        wiki log <all> 1000000 title 25 108.34 116.02 7.1%
        wiki log <all> 1000000 title 50 106.86 110.70 3.6%
        wiki log <all> 1000000 title 100 94.72 101.10 6.7%
        wiki log <all> 1000000 title 500 78.69 62.04 -21.2%
        wiki log <all> 1000000 title 1000 71.93 43.05 -40.2%
        random log <all> 1000000 rand string 10 112.81 117.80 4.4%
        random log <all> 1000000 rand string 25 113.92 115.73 1.6%
        random log <all> 1000000 rand string 50 113.55 110.08 -3.1%
        random log <all> 1000000 rand string 100 90.30 95.35 5.6%
        random log <all> 1000000 rand string 500 76.77 51.88 -32.4%
        random log <all> 1000000 rand string 1000 66.78 36.93 -44.7%
        random log <all> 1000000 country 10 114.26 118.72 3.9%
        random log <all> 1000000 country 25 113.96 115.81 1.6%
        random log <all> 1000000 country 50 113.59 109.78 -3.4%
        random log <all> 1000000 country 100 91.97 94.05 2.3%
        random log <all> 1000000 country 500 75.03 27.37 -63.5%
        random log <all> 1000000 country 1000 66.62 34.85 -47.7%
        random log <all> 1000000 rand int 10 118.06 124.42 5.4%
        random log <all> 1000000 rand int 25 117.76 120.76 2.5%
        random log <all> 1000000 rand int 50 117.35 115.81 -1.3%
        random log <all> 1000000 rand int 100 96.35 103.60 7.5%
        random log <all> 1000000 rand int 500 87.04 50.63 -41.8%
        random log <all> 1000000 rand int 1000 77.59 35.47 -54.3%
        Show
        Mark Miller added a comment - JAVA: java version "1.5.0_20" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_20-b02, mixed mode) OS: Linux quad-laptop 2.6.31-14-generic #48-Ubuntu SMP x86_64 GNU/Linux Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change wiki log 1 317914 title 10 80.09 78.20 -2.4% wiki log 1 317914 title 25 80.12 79.51 -0.8% wiki log 1 317914 title 50 78.61 76.03 -3.3% wiki log 1 317914 title 100 77.18 75.13 -2.7% wiki log 1 317914 title 500 75.01 54.74 -27.0% wiki log 1 317914 title 1000 67.77 41.29 -39.1% wiki log <all> 1000000 title 10 109.30 119.29 9.1% wiki log <all> 1000000 title 25 108.34 116.02 7.1% wiki log <all> 1000000 title 50 106.86 110.70 3.6% wiki log <all> 1000000 title 100 94.72 101.10 6.7% wiki log <all> 1000000 title 500 78.69 62.04 -21.2% wiki log <all> 1000000 title 1000 71.93 43.05 -40.2% random log <all> 1000000 rand string 10 112.81 117.80 4.4% random log <all> 1000000 rand string 25 113.92 115.73 1.6% random log <all> 1000000 rand string 50 113.55 110.08 -3.1% random log <all> 1000000 rand string 100 90.30 95.35 5.6% random log <all> 1000000 rand string 500 76.77 51.88 -32.4% random log <all> 1000000 rand string 1000 66.78 36.93 -44.7% random log <all> 1000000 country 10 114.26 118.72 3.9% random log <all> 1000000 country 25 113.96 115.81 1.6% random log <all> 1000000 country 50 113.59 109.78 -3.4% random log <all> 1000000 country 100 91.97 94.05 2.3% random log <all> 1000000 country 500 75.03 27.37 -63.5% random log <all> 1000000 country 1000 66.62 34.85 -47.7% random log <all> 1000000 rand int 10 118.06 124.42 5.4% random log <all> 1000000 rand int 25 117.76 120.76 2.5% random log <all> 1000000 rand int 50 117.35 115.81 -1.3% random log <all> 1000000 rand int 100 96.35 103.60 7.5% random log <all> 1000000 rand int 500 87.04 50.63 -41.8% random log <all> 1000000 rand int 1000 77.59 35.47 -54.3%
        Hide
        Michael McCandless added a comment -

        I agree the new results are now more ambiguous.

        Show
        Michael McCandless added a comment - I agree the new results are now more ambiguous.
        Hide
        Mark Miller added a comment -

        Time for the reevaluations?

        With the previous numbers, I would have said I'd -1 it. Now the numbers have changed. Its less clear.

        However - I'm still leaning against. I don't like the 30-50% drops even if top500,1000 are not as common as top 10,100. Its a nasty hit for those that do it. It doesn't carry tons of weight, but I don't like it.

        I also really don't like shifting back to this API right after rolling out the new one. Its very ugly. Its not a good precedent to set for our users. And unless we make a change in our back compat policy, we are stuck with both API's till 4.0. Managing two API's is something else I don't like.

        Finally, creating a custom sort is an advanced operation. The far majority of Lucene users will be good with the built in sorts. If you need a new custom one, you are into some serious stuff already. You can handle the new API. We have seen users handle it. Uwe had ideas for helping in that regard, and documentation can probably still be improved based on future user interactions.

        I'm not as dead set against it as I was, but I still don't think I'm for the change myself.

        Show
        Mark Miller added a comment - Time for the reevaluations? With the previous numbers, I would have said I'd -1 it. Now the numbers have changed. Its less clear. However - I'm still leaning against. I don't like the 30-50% drops even if top500,1000 are not as common as top 10,100. Its a nasty hit for those that do it. It doesn't carry tons of weight, but I don't like it. I also really don't like shifting back to this API right after rolling out the new one. Its very ugly. Its not a good precedent to set for our users. And unless we make a change in our back compat policy, we are stuck with both API's till 4.0. Managing two API's is something else I don't like. Finally, creating a custom sort is an advanced operation. The far majority of Lucene users will be good with the built in sorts. If you need a new custom one, you are into some serious stuff already. You can handle the new API. We have seen users handle it. Uwe had ideas for helping in that regard, and documentation can probably still be improved based on future user interactions. I'm not as dead set against it as I was, but I still don't think I'm for the change myself.
        Hide
        Jake Mannix added a comment -

        Mark, you say with the previous numbers, you'd say "-1", but if you look at the most common use case (top 10), the simpler API is faster in almost all cases, and in some cases it's 10-20% faster. Top 500, top 1000 are not only just "not as common", they're probably at the 1% level, or less.

        As far as shifting back, API-wise, that really shouldn't be a factor: 2.9 just came out, and what, we stick with a slightly slower API (for the most common use case across all Lucene users), which happens to be more complex, and more importantly: just very nonstandard - Comparable is very familiar to everyone, even if you have to have two forms, one for primitives, one for Objects - an api which doesn't have the whole slew of compare(), compareBottom(), copy(), setBottom(), value() and setNextReader() has a tremendous advantage over one which does.

        It's "advanced" to implement a custom sort, but it will be easier if it's not complex, and then it doesn't need to be "advanced" (shouldn't we be striving to make there be less APIs which are listed as "advanced", and instead more features which can do complex things but are still listed as things "normal users" can do).

        I think it's great precedent to set with users to say, "oops! we found that this new (just now as of this version) api was unnecessarily clumsy, we're shifting back to a simpler one which is just like the one you used to have". Sticking with a worse api because it performs better in only extreme scenarios because "we already moved on to this new api, shouldn't go back now, don't want to admit we ever made a mistake!" is what is "ugly".

        The main thing to remember is that the entire thinking around making this different from the old was only because it seemed that using a simpler api would perform much worse than this one, and it does not appear that this is the case. If that original reasoning turns out to have been incorrect, then the answer is simple: go with the simpler API now before users do get used to using the new one.

        If it turns out I'm wrong, and lots of users sort based on field values for the top 1000 entries often, or that the most recent runs turn out to be flukes and are not typical performance, only then would I'd change my opinion.

        Show
        Jake Mannix added a comment - Mark, you say with the previous numbers, you'd say "-1", but if you look at the most common use case (top 10), the simpler API is faster in almost all cases, and in some cases it's 10-20% faster. Top 500, top 1000 are not only just "not as common", they're probably at the 1% level, or less. As far as shifting back, API-wise, that really shouldn't be a factor: 2.9 just came out, and what, we stick with a slightly slower API (for the most common use case across all Lucene users), which happens to be more complex , and more importantly: just very nonstandard - Comparable is very familiar to everyone, even if you have to have two forms, one for primitives, one for Objects - an api which doesn't have the whole slew of compare(), compareBottom(), copy(), setBottom(), value() and setNextReader() has a tremendous advantage over one which does. It's "advanced" to implement a custom sort, but it will be easier if it's not complex, and then it doesn't need to be "advanced" (shouldn't we be striving to make there be less APIs which are listed as "advanced", and instead more features which can do complex things but are still listed as things "normal users" can do). I think it's great precedent to set with users to say, "oops! we found that this new (just now as of this version) api was unnecessarily clumsy, we're shifting back to a simpler one which is just like the one you used to have". Sticking with a worse api because it performs better in only extreme scenarios because "we already moved on to this new api, shouldn't go back now, don't want to admit we ever made a mistake!" is what is "ugly". The main thing to remember is that the entire thinking around making this different from the old was only because it seemed that using a simpler api would perform much worse than this one, and it does not appear that this is the case. If that original reasoning turns out to have been incorrect, then the answer is simple: go with the simpler API now before users do get used to using the new one. If it turns out I'm wrong, and lots of users sort based on field values for the top 1000 entries often, or that the most recent runs turn out to be flukes and are not typical performance, only then would I'd change my opinion.
        Hide
        Mark Miller added a comment -

        Given good enough reasons, I could see saying we made a mistake and switching back - as it is, for the reasons I've said, I don't find that to be the case. I don't feel the new API was a mistake yet.

        Lots of other guys to weigh in though. If everyone else feels like its the right move, I'm not going to -1 it - just weighing in with how I feel.

        I'm not seeing 10-20% faster across the board - on my system it doesnt even hit 10% and I'm a linux user and advocate. I'm all for performance, but < 10% here and there is not enough to sway me against 30-50% loses in the large queue cases, combined with having to shift back. Its not a clear win either way, but I've said which way I lean.

        Luckily, its not just me you have to convince. Lots of smart people still to weigh in.

        Show
        Mark Miller added a comment - Given good enough reasons, I could see saying we made a mistake and switching back - as it is, for the reasons I've said, I don't find that to be the case. I don't feel the new API was a mistake yet. Lots of other guys to weigh in though. If everyone else feels like its the right move, I'm not going to -1 it - just weighing in with how I feel. I'm not seeing 10-20% faster across the board - on my system it doesnt even hit 10% and I'm a linux user and advocate. I'm all for performance, but < 10% here and there is not enough to sway me against 30-50% loses in the large queue cases, combined with having to shift back. Its not a clear win either way, but I've said which way I lean. Luckily, its not just me you have to convince. Lots of smart people still to weigh in.
        Hide
        Michael McCandless added a comment -

        Looking more at this, I think there are a few problems with the
        current test:

        • The "random" index is smallish – it has only 7 segments. I'd
          like to test across a wider range of segments.
        • My "optimized" code for the multi-PQ (old) API is too optimized –
          I conflated the comparator directly into the PQ (eg I have
          SortByIntQueue, SortByStringQueue, that directly subclass
          DocIDPriorityQueue and override only compare). For source code
          specialization this would be appropriate (it is "correct"), but in
          order to test the two extension APIs, it's not. I'll rework it to
          pull it back out into a separate comparator.
        • We should test more realistic queries & index. Of all tested so
          far, query "1" on wiki index is the most realistic, and we see the
          least gains there. The : query is unnatural, in part because
          the I think first N title in wiki'd index appear to be roughly
          alpha sorted. Random strings & random ints are very realistic.
        • We need more results from diverse envs – eg Windows (different
          versions), different Linux's, JREs, etc.

        Also, I really do not like the large perf hits at highish topN sizes.
        Admittedly it's unusual (though I suspect not THAT rare) to use such a
        large topN, but, I esepically don't like that it doesn't degrade
        gracefully – it's surprising. Likewise I'd like to see with highish
        number of segments how things degrade.

        I would also give more weight to the JRE 1.6, 64 bit, results. Yes we
        can debate what's the popular JRE in production today, but this API is
        going to last a looong time in Lucene so I think we should bias
        towards what will be the future JRE. Maybe we should even test on JDK
        7 preview... though it's probably too early to make any decisisions
        based in its performance.

        Show
        Michael McCandless added a comment - Looking more at this, I think there are a few problems with the current test: The "random" index is smallish – it has only 7 segments. I'd like to test across a wider range of segments. My "optimized" code for the multi-PQ (old) API is too optimized – I conflated the comparator directly into the PQ (eg I have SortByIntQueue, SortByStringQueue, that directly subclass DocIDPriorityQueue and override only compare). For source code specialization this would be appropriate (it is "correct"), but in order to test the two extension APIs, it's not. I'll rework it to pull it back out into a separate comparator. We should test more realistic queries & index. Of all tested so far, query "1" on wiki index is the most realistic, and we see the least gains there. The : query is unnatural, in part because the I think first N title in wiki'd index appear to be roughly alpha sorted. Random strings & random ints are very realistic. We need more results from diverse envs – eg Windows (different versions), different Linux's, JREs, etc. Also, I really do not like the large perf hits at highish topN sizes. Admittedly it's unusual (though I suspect not THAT rare) to use such a large topN, but, I esepically don't like that it doesn't degrade gracefully – it's surprising. Likewise I'd like to see with highish number of segments how things degrade. I would also give more weight to the JRE 1.6, 64 bit, results. Yes we can debate what's the popular JRE in production today, but this API is going to last a looong time in Lucene so I think we should bias towards what will be the future JRE. Maybe we should even test on JDK 7 preview... though it's probably too early to make any decisisions based in its performance.
        Hide
        Mark Miller added a comment - - edited

        Waiting for Mike's changes to try and do some different varying tests, but while I wait, here is a 1.6 test to go with that 1.5 test (kind of funny how it likes 100 over 50 and 500 like it does):

        JAVA:
        java version "1.6.0_15"
        Java(TM) SE Runtime Environment (build 1.6.0_15-b03)
        Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02, mixed mode)

        OS:
        Linux quad-laptop 2.6.31-14-generic #48-Ubuntu SMP x86_64 GNU/Linux

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        wiki log 1 317914 title 10 105.49 106.68 1.1%
        wiki log 1 317914 title 25 108.78 110.37 1.5%
        wiki log 1 317914 title 50 106.79 106.35 -0.4%
        wiki log 1 317914 title 100 105.26 100.10 -4.9%
        wiki log 1 317914 title 500 91.03 72.09 -20.8%
        wiki log 1 317914 title 1000 80.17 53.61 -33.1%
        wiki log <all> 1000000 title 10 114.44 116.63 1.9%
        wiki log <all> 1000000 title 25 113.03 116.77 3.3%
        wiki log <all> 1000000 title 50 111.52 112.33 0.7%
        wiki log <all> 1000000 title 100 86.97 101.80 17.1%
        wiki log <all> 1000000 title 500 77.79 66.45 -14.6%
        wiki log <all> 1000000 title 1000 70.76 46.45 -34.4%
        random log <all> 1000000 rand string 10 118.47 112.77 -4.8%
        random log <all> 1000000 rand string 25 117.98 109.12 -7.5%
        random log <all> 1000000 rand string 50 117.51 105.05 -10.6%
        random log <all> 1000000 rand string 100 86.44 96.08 11.2%
        random log <all> 1000000 rand string 500 77.03 60.82 -21.0%
        random log <all> 1000000 rand string 1000 68.84 40.33 -41.4%
        random log <all> 1000000 country 10 118.25 110.85 -6.3%
        random log <all> 1000000 country 25 117.55 110.41 -6.1%
        random log <all> 1000000 country 50 117.22 103.35 -11.8%
        random log <all> 1000000 country 100 87.01 94.02 8.1%
        random log <all> 1000000 country 500 79.53 60.63 -23.8%
        random log <all> 1000000 country 1000 72.62 40.15 -44.7%
        random log <all> 1000000 rand int 10 125.54 116.18 -7.5%
        random log <all> 1000000 rand int 25 124.83 111.81 -10.4%
        random log <all> 1000000 rand int 50 124.17 108.26 -12.8%
        random log <all> 1000000 rand int 100 88.58 100.63 13.6%
        random log <all> 1000000 rand int 500 81.81 64.23 -21.5%
        random log <all> 1000000 rand int 1000 74.14 42.79 -42.3%
        Show
        Mark Miller added a comment - - edited Waiting for Mike's changes to try and do some different varying tests, but while I wait, here is a 1.6 test to go with that 1.5 test (kind of funny how it likes 100 over 50 and 500 like it does): JAVA: java version "1.6.0_15" Java(TM) SE Runtime Environment (build 1.6.0_15-b03) Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02, mixed mode) OS: Linux quad-laptop 2.6.31-14-generic #48-Ubuntu SMP x86_64 GNU/Linux Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change wiki log 1 317914 title 10 105.49 106.68 1.1% wiki log 1 317914 title 25 108.78 110.37 1.5% wiki log 1 317914 title 50 106.79 106.35 -0.4% wiki log 1 317914 title 100 105.26 100.10 -4.9% wiki log 1 317914 title 500 91.03 72.09 -20.8% wiki log 1 317914 title 1000 80.17 53.61 -33.1% wiki log <all> 1000000 title 10 114.44 116.63 1.9% wiki log <all> 1000000 title 25 113.03 116.77 3.3% wiki log <all> 1000000 title 50 111.52 112.33 0.7% wiki log <all> 1000000 title 100 86.97 101.80 17.1% wiki log <all> 1000000 title 500 77.79 66.45 -14.6% wiki log <all> 1000000 title 1000 70.76 46.45 -34.4% random log <all> 1000000 rand string 10 118.47 112.77 -4.8% random log <all> 1000000 rand string 25 117.98 109.12 -7.5% random log <all> 1000000 rand string 50 117.51 105.05 -10.6% random log <all> 1000000 rand string 100 86.44 96.08 11.2% random log <all> 1000000 rand string 500 77.03 60.82 -21.0% random log <all> 1000000 rand string 1000 68.84 40.33 -41.4% random log <all> 1000000 country 10 118.25 110.85 -6.3% random log <all> 1000000 country 25 117.55 110.41 -6.1% random log <all> 1000000 country 50 117.22 103.35 -11.8% random log <all> 1000000 country 100 87.01 94.02 8.1% random log <all> 1000000 country 500 79.53 60.63 -23.8% random log <all> 1000000 country 1000 72.62 40.15 -44.7% random log <all> 1000000 rand int 10 125.54 116.18 -7.5% random log <all> 1000000 rand int 25 124.83 111.81 -10.4% random log <all> 1000000 rand int 50 124.17 108.26 -12.8% random log <all> 1000000 rand int 100 88.58 100.63 13.6% random log <all> 1000000 rand int 500 81.81 64.23 -21.5% random log <all> 1000000 rand int 1000 74.14 42.79 -42.3%
        Hide
        Michael McCandless added a comment -

        kind of funny how it likes 100 over 50 and 500 like it does

        I've also noticed this. It's weird.

        Show
        Michael McCandless added a comment - kind of funny how it likes 100 over 50 and 500 like it does I've also noticed this. It's weird.
        Hide
        Michael McCandless added a comment -

        New patch attached, that un-conflates the comparator & PQ. I think this patch more accurately separates the comparator from the queue, ie, better matches the approach we'd go with if we did this for "real".

        Also, I turned on the BALANCED case in sortBench.py, which also generates 20-segment balanced (same segment size) index for both wiki & random, and runs the tests on those.

        Finally, I added query "2" for testing, if the index is wikipedia.

        Show
        Michael McCandless added a comment - New patch attached, that un-conflates the comparator & PQ. I think this patch more accurately separates the comparator from the queue, ie, better matches the approach we'd go with if we did this for "real". Also, I turned on the BALANCED case in sortBench.py, which also generates 20-segment balanced (same segment size) index for both wiki & random, and runs the tests on those. Finally, I added query "2" for testing, if the index is wikipedia.
        Hide
        Michael McCandless added a comment -

        Small change: add in the same reverseMul that singlePQ must do, to multiPQ, to handle reversed sort.

        Show
        Michael McCandless added a comment - Small change: add in the same reverseMul that singlePQ must do, to multiPQ, to handle reversed sort.
        Hide
        Michael McCandless added a comment -

        Ugh – last one was wrong patch. This one should be right.

        Show
        Michael McCandless added a comment - Ugh – last one was wrong patch. This one should be right.
        Hide
        Michael McCandless added a comment -

        OK yet more results, this time on 5M doc indexes:

        JAVA:
        java version "1.6.0_14"
        Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
        Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

        OS:
        SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        wiki log 1 1169569 title 10 26.80 28.08 4.8%
        wiki log 1 1169569 title 25 26.52 27.68 4.4%
        wiki log 1 1169569 title 50 26.37 26.58 0.8%
        wiki log 1 1169569 title 100 25.91 26.50 2.3%
        wiki log 1 1169569 title 500 24.28 23.78 -2.1%
        wiki log 1 1169569 title 1000 22.64 21.32 -5.8%
        wiki log 2 1088167 title 10 28.23 29.16 3.3%
        wiki log 2 1088167 title 25 27.90 28.54 2.3%
        wiki log 2 1088167 title 50 28.04 28.62 2.1%
        wiki log 2 1088167 title 100 27.57 27.88 1.1%
        wiki log 2 1088167 title 500 25.85 25.27 -2.2%
        wiki log 2 1088167 title 1000 23.65 22.33 -5.6%
        wiki log <all> 5000000 title 10 27.32 30.47 11.5%
        wiki log <all> 5000000 title 25 26.29 31.16 18.5%
        wiki log <all> 5000000 title 50 26.64 30.03 12.7%
        wiki log <all> 5000000 title 100 21.37 29.67 38.8%
        wiki log <all> 5000000 title 500 21.05 26.15 24.2%
        wiki log <all> 5000000 title 1000 19.83 21.97 10.8%
        random log <all> 5000000 rand string 10 29.00 32.39 11.7%
        random log <all> 5000000 rand string 25 28.39 32.36 14.0%
        random log <all> 5000000 rand string 50 28.74 31.90 11.0%
        random log <all> 5000000 rand string 100 28.44 31.16 9.6%
        random log <all> 5000000 rand string 500 28.20 28.03 -0.6%
        random log <all> 5000000 rand string 1000 19.47 24.50 25.8%
        random log <all> 5000000 country 10 29.18 32.15 10.2%
        random log <all> 5000000 country 25 28.66 31.74 10.7%
        random log <all> 5000000 country 50 28.02 32.00 14.2%
        random log <all> 5000000 country 100 28.68 31.27 9.0%
        random log <all> 5000000 country 500 28.16 26.66 -5.3%
        random log <all> 5000000 country 1000 19.58 23.03 17.6%
        random log <all> 5000000 rand int 10 28.75 29.51 2.6%
        random log <all> 5000000 rand int 25 28.74 28.96 0.8%
        random log <all> 5000000 rand int 50 29.29 28.63 -2.3%
        random log <all> 5000000 rand int 100 29.14 28.38 -2.6%
        random log <all> 5000000 rand int 500 28.44 24.59 -13.5%
        random log <all> 5000000 rand int 1000 20.06 21.71 8.2%
        wiki balanced 1 1169558 title 10 25.70 27.06 5.3%
        wiki balanced 1 1169558 title 25 25.91 27.26 5.2%
        wiki balanced 1 1169558 title 50 25.84 26.63 3.1%
        wiki balanced 1 1169558 title 100 25.51 25.66 0.6%
        wiki balanced 1 1169558 title 500 24.39 22.32 -8.5%
        wiki balanced 1 1169558 title 1000 22.66 18.74 -17.3%
        wiki balanced 2 1088162 title 10 27.86 28.77 3.3%
        wiki balanced 2 1088162 title 25 27.46 28.37 3.3%
        wiki balanced 2 1088162 title 50 27.49 28.05 2.0%
        wiki balanced 2 1088162 title 100 27.64 26.25 -5.0%
        wiki balanced 2 1088162 title 500 25.52 23.26 -8.9%
        wiki balanced 2 1088162 title 1000 23.77 19.69 -17.2%
        wiki balanced <all> 5000000 title 10 27.13 31.05 14.4%
        wiki balanced <all> 5000000 title 25 27.81 30.47 9.6%
        wiki balanced <all> 5000000 title 50 27.99 29.81 6.5%
        wiki balanced <all> 5000000 title 100 22.00 29.35 33.4%
        wiki balanced <all> 5000000 title 500 21.03 24.12 14.7%
        wiki balanced <all> 5000000 title 1000 20.43 19.80 -3.1%
        random balanced <all> 5000000 rand string 10 28.16 32.13 14.1%
        random balanced <all> 5000000 rand string 25 28.40 31.44 10.7%
        random balanced <all> 5000000 rand string 50 28.13 31.07 10.5%
        random balanced <all> 5000000 rand string 100 27.93 31.25 11.9%
        random balanced <all> 5000000 rand string 500 27.49 26.91 -2.1%
        random balanced <all> 5000000 rand string 1000 19.13 22.69 18.6%
        random balanced <all> 5000000 country 10 27.74 32.11 15.8%
        random balanced <all> 5000000 country 25 28.31 32.15 13.6%
        random balanced <all> 5000000 country 50 27.98 31.17 11.4%
        random balanced <all> 5000000 country 100 27.66 30.85 11.5%
        random balanced <all> 5000000 country 500 27.57 26.25 -4.8%
        random balanced <all> 5000000 country 1000 19.01 21.27 11.9%
        random balanced <all> 5000000 rand int 10 28.16 28.57 1.5%
        random balanced <all> 5000000 rand int 25 28.39 28.56 0.6%
        random balanced <all> 5000000 rand int 50 28.08 27.84 -0.9%
        random balanced <all> 5000000 rand int 100 27.81 27.23 -2.1%
        random balanced <all> 5000000 rand int 500 27.80 24.01 -13.6%
        random balanced <all> 5000000 rand int 1000 19.87 20.14 1.4%
        Show
        Michael McCandless added a comment - OK yet more results, this time on 5M doc indexes: JAVA: java version "1.6.0_14" Java(TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode) OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change wiki log 1 1169569 title 10 26.80 28.08 4.8% wiki log 1 1169569 title 25 26.52 27.68 4.4% wiki log 1 1169569 title 50 26.37 26.58 0.8% wiki log 1 1169569 title 100 25.91 26.50 2.3% wiki log 1 1169569 title 500 24.28 23.78 -2.1% wiki log 1 1169569 title 1000 22.64 21.32 -5.8% wiki log 2 1088167 title 10 28.23 29.16 3.3% wiki log 2 1088167 title 25 27.90 28.54 2.3% wiki log 2 1088167 title 50 28.04 28.62 2.1% wiki log 2 1088167 title 100 27.57 27.88 1.1% wiki log 2 1088167 title 500 25.85 25.27 -2.2% wiki log 2 1088167 title 1000 23.65 22.33 -5.6% wiki log <all> 5000000 title 10 27.32 30.47 11.5% wiki log <all> 5000000 title 25 26.29 31.16 18.5% wiki log <all> 5000000 title 50 26.64 30.03 12.7% wiki log <all> 5000000 title 100 21.37 29.67 38.8% wiki log <all> 5000000 title 500 21.05 26.15 24.2% wiki log <all> 5000000 title 1000 19.83 21.97 10.8% random log <all> 5000000 rand string 10 29.00 32.39 11.7% random log <all> 5000000 rand string 25 28.39 32.36 14.0% random log <all> 5000000 rand string 50 28.74 31.90 11.0% random log <all> 5000000 rand string 100 28.44 31.16 9.6% random log <all> 5000000 rand string 500 28.20 28.03 -0.6% random log <all> 5000000 rand string 1000 19.47 24.50 25.8% random log <all> 5000000 country 10 29.18 32.15 10.2% random log <all> 5000000 country 25 28.66 31.74 10.7% random log <all> 5000000 country 50 28.02 32.00 14.2% random log <all> 5000000 country 100 28.68 31.27 9.0% random log <all> 5000000 country 500 28.16 26.66 -5.3% random log <all> 5000000 country 1000 19.58 23.03 17.6% random log <all> 5000000 rand int 10 28.75 29.51 2.6% random log <all> 5000000 rand int 25 28.74 28.96 0.8% random log <all> 5000000 rand int 50 29.29 28.63 -2.3% random log <all> 5000000 rand int 100 29.14 28.38 -2.6% random log <all> 5000000 rand int 500 28.44 24.59 -13.5% random log <all> 5000000 rand int 1000 20.06 21.71 8.2% wiki balanced 1 1169558 title 10 25.70 27.06 5.3% wiki balanced 1 1169558 title 25 25.91 27.26 5.2% wiki balanced 1 1169558 title 50 25.84 26.63 3.1% wiki balanced 1 1169558 title 100 25.51 25.66 0.6% wiki balanced 1 1169558 title 500 24.39 22.32 -8.5% wiki balanced 1 1169558 title 1000 22.66 18.74 -17.3% wiki balanced 2 1088162 title 10 27.86 28.77 3.3% wiki balanced 2 1088162 title 25 27.46 28.37 3.3% wiki balanced 2 1088162 title 50 27.49 28.05 2.0% wiki balanced 2 1088162 title 100 27.64 26.25 -5.0% wiki balanced 2 1088162 title 500 25.52 23.26 -8.9% wiki balanced 2 1088162 title 1000 23.77 19.69 -17.2% wiki balanced <all> 5000000 title 10 27.13 31.05 14.4% wiki balanced <all> 5000000 title 25 27.81 30.47 9.6% wiki balanced <all> 5000000 title 50 27.99 29.81 6.5% wiki balanced <all> 5000000 title 100 22.00 29.35 33.4% wiki balanced <all> 5000000 title 500 21.03 24.12 14.7% wiki balanced <all> 5000000 title 1000 20.43 19.80 -3.1% random balanced <all> 5000000 rand string 10 28.16 32.13 14.1% random balanced <all> 5000000 rand string 25 28.40 31.44 10.7% random balanced <all> 5000000 rand string 50 28.13 31.07 10.5% random balanced <all> 5000000 rand string 100 27.93 31.25 11.9% random balanced <all> 5000000 rand string 500 27.49 26.91 -2.1% random balanced <all> 5000000 rand string 1000 19.13 22.69 18.6% random balanced <all> 5000000 country 10 27.74 32.11 15.8% random balanced <all> 5000000 country 25 28.31 32.15 13.6% random balanced <all> 5000000 country 50 27.98 31.17 11.4% random balanced <all> 5000000 country 100 27.66 30.85 11.5% random balanced <all> 5000000 country 500 27.57 26.25 -4.8% random balanced <all> 5000000 country 1000 19.01 21.27 11.9% random balanced <all> 5000000 rand int 10 28.16 28.57 1.5% random balanced <all> 5000000 rand int 25 28.39 28.56 0.6% random balanced <all> 5000000 rand int 50 28.08 27.84 -0.9% random balanced <all> 5000000 rand int 100 27.81 27.23 -2.1% random balanced <all> 5000000 rand int 500 27.80 24.01 -13.6% random balanced <all> 5000000 rand int 1000 19.87 20.14 1.4%
        Hide
        Jake Mannix added a comment -

        Excellent, good to see that my big-O analysis is holding up on the 5M doc set: as the sub-leading terms drop off and become negligible, any improvement of singlePQ over multiPQ starts to go away entirely.

        Still seems to be some statistical fluctuations here and there though (why would 1000 hits ever have better perf for multiPQ vs singlePQ compared to 500 hits?), but I guess that's entropy for you...

        Show
        Jake Mannix added a comment - Excellent, good to see that my big-O analysis is holding up on the 5M doc set: as the sub-leading terms drop off and become negligible, any improvement of singlePQ over multiPQ starts to go away entirely. Still seems to be some statistical fluctuations here and there though (why would 1000 hits ever have better perf for multiPQ vs singlePQ compared to 500 hits?), but I guess that's entropy for you...
        Hide
        Michael McCandless added a comment -

        Hmm disregard those results – something is wrong.

        Show
        Michael McCandless added a comment - Hmm disregard those results – something is wrong.
        Hide
        Jake Mannix added a comment -

        What happened with these ones, Mike?

        Show
        Jake Mannix added a comment - What happened with these ones, Mike?
        Hide
        Michael McCandless added a comment -

        First, my reverseMul changes did not actually compile (I've now added "ant compile" to sortBench.py). Second, the field we are sorting on for int seems to be null / have too many zeros. I'm still digging on why...

        Show
        Michael McCandless added a comment - First, my reverseMul changes did not actually compile (I've now added "ant compile" to sortBench.py). Second, the field we are sorting on for int seems to be null / have too many zeros. I'm still digging on why...
        Hide
        Michael McCandless added a comment -

        OK new rev attached. In previous tests, ints were restricted to 0..20000. Now they span the full int range (randomly). I added "ant compile" to sortBench.py. I improved a bit the merging at the end of multi-PQ, by up-front getting the Comparable (rather than recomputing on every comparison), and I use this to return the topN comparables, which I now print in the logs.

        I'll return on my opensolaris box...

        Show
        Michael McCandless added a comment - OK new rev attached. In previous tests, ints were restricted to 0..20000. Now they span the full int range (randomly). I added "ant compile" to sortBench.py. I improved a bit the merging at the end of multi-PQ, by up-front getting the Comparable (rather than recomputing on every comparison), and I use this to return the topN comparables, which I now print in the logs. I'll return on my opensolaris box...
        Hide
        Jake Mannix added a comment -
        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        random balanced <all> 5000000 rand int 10 22.02 22.99 4.4%
        random balanced <all> 5000000 rand int 25 21.45 22.91 6.8%
        random balanced <all> 5000000 rand int 50 21.40 22.58 5.5%
        random balanced <all> 5000000 rand int 100 21.16 21.93 3.6%
        random balanced <all> 5000000 rand int 500 21.27 17.45 -18.0%
        random balanced <all> 5000000 rand int 1000 20.62 14.33 -30.5%
        random balanced <all> 5000000 rand string 10 18.33 21.91 19.5%
        random balanced <all> 5000000 rand string 25 18.37 22.20 20.8%
        random balanced <all> 5000000 rand string 50 18.51 22.13 19.6%
        random balanced <all> 5000000 rand string 100 20.30 21.91 7.9%
        random balanced <all> 5000000 rand string 500 18.34 18.69 1.9%
        random balanced <all> 5000000 rand string 1000 17.83 15.04 -15.6%
        random balanced <all> 5000000 country 10 18.49 22.16 19.8%
        random balanced <all> 5000000 country 25 18.62 22.28 19.7%
        random balanced <all> 5000000 country 50 18.30 21.86 19.5%
        random balanced <all> 5000000 country 100 20.46 22.08 7.9%
        random balanced <all> 5000000 country 500 18.02 18.38 2.0%
        random balanced <all> 5000000 country 1000 18.28 14.62 -20.0%
        random log <all> 5000000 rand int 10 22.04 23.18 5.2%
        random log <all> 5000000 rand int 25 21.48 23.02 7.2%
        random log <all> 5000000 rand int 50 21.37 22.96 7.4%
        random log <all> 5000000 rand int 100 21.47 22.43 4.5%
        random log <all> 5000000 rand int 500 21.09 19.60 -7.1%
        random log <all> 5000000 rand int 1000 20.37 17.58 -13.7%
        random log <all> 5000000 rand string 10 18.54 22.49 21.3%
        random log <all> 5000000 rand string 25 18.50 22.32 20.6%
        random log <all> 5000000 rand string 50 18.53 22.24 20.0%
        random log <all> 5000000 rand string 100 20.46 22.43 9.6%
        random log <all> 5000000 rand string 500 18.59 20.83 12.0%
        random log <all> 5000000 rand string 1000 18.27 18.01 -1.4%
        random log <all> 5000000 country 10 18.55 22.50 21.3%
        random log <all> 5000000 country 25 18.48 22.35 20.9%
        random log <all> 5000000 country 50 18.48 22.24 20.3%
        random log <all> 5000000 country 100 20.73 22.34 7.8%
        random log <all> 5000000 country 500 18.77 20.12 7.2%
        random log <all> 5000000 country 1000 17.92 18.44 2.9%
        Show
        Jake Mannix added a comment - Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change random balanced <all> 5000000 rand int 10 22.02 22.99 4.4% random balanced <all> 5000000 rand int 25 21.45 22.91 6.8% random balanced <all> 5000000 rand int 50 21.40 22.58 5.5% random balanced <all> 5000000 rand int 100 21.16 21.93 3.6% random balanced <all> 5000000 rand int 500 21.27 17.45 -18.0% random balanced <all> 5000000 rand int 1000 20.62 14.33 -30.5% random balanced <all> 5000000 rand string 10 18.33 21.91 19.5% random balanced <all> 5000000 rand string 25 18.37 22.20 20.8% random balanced <all> 5000000 rand string 50 18.51 22.13 19.6% random balanced <all> 5000000 rand string 100 20.30 21.91 7.9% random balanced <all> 5000000 rand string 500 18.34 18.69 1.9% random balanced <all> 5000000 rand string 1000 17.83 15.04 -15.6% random balanced <all> 5000000 country 10 18.49 22.16 19.8% random balanced <all> 5000000 country 25 18.62 22.28 19.7% random balanced <all> 5000000 country 50 18.30 21.86 19.5% random balanced <all> 5000000 country 100 20.46 22.08 7.9% random balanced <all> 5000000 country 500 18.02 18.38 2.0% random balanced <all> 5000000 country 1000 18.28 14.62 -20.0% random log <all> 5000000 rand int 10 22.04 23.18 5.2% random log <all> 5000000 rand int 25 21.48 23.02 7.2% random log <all> 5000000 rand int 50 21.37 22.96 7.4% random log <all> 5000000 rand int 100 21.47 22.43 4.5% random log <all> 5000000 rand int 500 21.09 19.60 -7.1% random log <all> 5000000 rand int 1000 20.37 17.58 -13.7% random log <all> 5000000 rand string 10 18.54 22.49 21.3% random log <all> 5000000 rand string 25 18.50 22.32 20.6% random log <all> 5000000 rand string 50 18.53 22.24 20.0% random log <all> 5000000 rand string 100 20.46 22.43 9.6% random log <all> 5000000 rand string 500 18.59 20.83 12.0% random log <all> 5000000 rand string 1000 18.27 18.01 -1.4% random log <all> 5000000 country 10 18.55 22.50 21.3% random log <all> 5000000 country 25 18.48 22.35 20.9% random log <all> 5000000 country 50 18.48 22.24 20.3% random log <all> 5000000 country 100 20.73 22.34 7.8% random log <all> 5000000 country 500 18.77 20.12 7.2% random log <all> 5000000 country 1000 17.92 18.44 2.9%
        Hide
        Jake Mannix added a comment -

        Not terribly useful, in that it's MacOS X laptop -

        2.4 GHz Intel Core 2 Duo, 4GB 667 MHz DDR2 SDRAM,

        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304)
        Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing)

        but it's run with the latest patch.

        Show
        Jake Mannix added a comment - Not terribly useful, in that it's MacOS X laptop - 2.4 GHz Intel Core 2 Duo, 4GB 667 MHz DDR2 SDRAM, Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304) Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing) but it's run with the latest patch.
        Hide
        Michael McCandless added a comment -

        Yet another patch. This one incorporates thread safety fixes (from LUCENE-1994) that were messing up at least the sort-by-country cases by creating many docs with null country. Also, I noticed we were unfairly penalizing the single PQ test by passing in "false" for "docsScoredInOrder", whereas the multi-PQ case currently assumes docs are scored in order. So I changed that param to "true" to make the test fair.

        Show
        Michael McCandless added a comment - Yet another patch. This one incorporates thread safety fixes (from LUCENE-1994 ) that were messing up at least the sort-by-country cases by creating many docs with null country. Also, I noticed we were unfairly penalizing the single PQ test by passing in "false" for "docsScoredInOrder", whereas the multi-PQ case currently assumes docs are scored in order. So I changed that param to "true" to make the test fair.
        Hide
        Michael McCandless added a comment -

        OK results using last patch:

        JAVA:
        java version "1.6.0_14"
        Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
        Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

        OS:
        SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        wiki balanced 1 1169558 title 10 25.87 25.36 -2.0%
        wiki balanced 1 1169558 title 25 24.95 26.51 6.3%
        wiki balanced 1 1169558 title 50 25.27 25.28 0.0%
        wiki balanced 1 1169558 title 100 24.81 24.25 -2.3%
        wiki balanced 1 1169558 title 500 22.91 21.37 -6.7%
        wiki balanced 1 1169558 title 1000 21.45 17.73 -17.3%
        wiki balanced 2 1088162 title 10 27.45 26.97 -1.7%
        wiki balanced 2 1088162 title 25 26.85 26.88 0.1%
        wiki balanced 2 1088162 title 50 26.66 26.66 0.0%
        wiki balanced 2 1088162 title 100 26.45 26.27 -0.7%
        wiki balanced 2 1088162 title 500 23.53 21.99 -6.5%
        wiki balanced 2 1088162 title 1000 23.03 19.38 -15.8%
        wiki balanced <all> 5000000 title 10 26.75 27.41 2.5%
        wiki balanced <all> 5000000 title 25 26.18 27.74 6.0%
        wiki balanced <all> 5000000 title 50 24.28 27.06 11.4%
        wiki balanced <all> 5000000 title 100 20.82 26.96 29.5%
        wiki balanced <all> 5000000 title 500 19.71 22.41 13.7%
        wiki balanced <all> 5000000 title 1000 19.65 18.18 -7.5%
        random balanced <all> 5000000 rand string 10 25.75 27.66 7.4%
        random balanced <all> 5000000 rand string 25 24.32 27.59 13.4%
        random balanced <all> 5000000 rand string 50 21.42 27.19 26.9%
        random balanced <all> 5000000 rand string 100 19.30 26.66 38.1%
        random balanced <all> 5000000 rand string 500 17.80 22.00 23.6%
        random balanced <all> 5000000 rand string 1000 17.30 18.12 4.7%
        random balanced <all> 5000000 country 10 28.63 28.04 -2.1%
        random balanced <all> 5000000 country 25 25.02 27.57 10.2%
        random balanced <all> 5000000 country 50 20.50 26.81 30.8%
        random balanced <all> 5000000 country 100 19.95 26.34 32.0%
        random balanced <all> 5000000 country 500 18.43 22.19 20.4%
        random balanced <all> 5000000 country 1000 18.69 17.54 -6.2%
        random balanced <all> 5000000 rand int 10 28.39 29.46 3.8%
        random balanced <all> 5000000 rand int 25 26.97 27.73 2.8%
        random balanced <all> 5000000 rand int 50 29.76 27.87 -6.4%
        random balanced <all> 5000000 rand int 100 28.63 26.59 -7.1%
        random balanced <all> 5000000 rand int 500 27.18 21.67 -20.3%
        random balanced <all> 5000000 rand int 1000 25.79 16.92 -34.4%
        wiki log 1 1169569 title 10 23.98 23.32 -2.8%
        wiki log 1 1169569 title 25 25.55 26.53 3.8%
        wiki log 1 1169569 title 50 25.44 26.04 2.4%
        wiki log 1 1169569 title 100 24.43 25.92 6.1%
        wiki log 1 1169569 title 500 23.61 22.77 -3.6%
        wiki log 1 1169569 title 1000 21.88 20.48 -6.4%
        wiki log 2 1088167 title 10 28.03 27.48 -2.0%
        wiki log 2 1088167 title 25 27.20 27.57 1.4%
        wiki log 2 1088167 title 50 27.75 27.59 -0.6%
        wiki log 2 1088167 title 100 26.77 26.69 -0.3%
        wiki log 2 1088167 title 500 25.11 24.21 -3.6%
        wiki log 2 1088167 title 1000 23.27 21.75 -6.5%
        wiki log <all> 5000000 title 10 26.51 28.24 6.5%
        wiki log <all> 5000000 title 25 25.22 27.58 9.4%
        wiki log <all> 5000000 title 50 24.51 28.05 14.4%
        wiki log <all> 5000000 title 100 20.64 27.47 33.1%
        wiki log <all> 5000000 title 500 19.69 24.32 23.5%
        wiki log <all> 5000000 title 1000 19.30 20.56 6.5%
        random log <all> 5000000 rand string 10 25.57 26.94 5.4%
        random log <all> 5000000 rand string 25 25.56 28.12 10.0%
        random log <all> 5000000 rand string 50 21.33 27.63 29.5%
        random log <all> 5000000 rand string 100 18.96 27.31 44.0%
        random log <all> 5000000 rand string 500 18.74 23.72 26.6%
        random log <all> 5000000 rand string 1000 16.88 20.39 20.8%
        random log <all> 5000000 country 10 28.96 26.71 -7.8%
        random log <all> 5000000 country 25 24.91 27.41 10.0%
        random log <all> 5000000 country 50 19.81 27.19 37.3%
        random log <all> 5000000 country 100 20.80 26.80 28.8%
        random log <all> 5000000 country 500 18.90 23.63 25.0%
        random log <all> 5000000 country 1000 18.52 19.31 4.3%
        random log <all> 5000000 rand int 10 27.48 28.12 2.3%
        random log <all> 5000000 rand int 25 28.81 27.76 -3.6%
        random log <all> 5000000 rand int 50 29.60 28.41 -4.0%
        random log <all> 5000000 rand int 100 29.16 26.44 -9.3%
        random log <all> 5000000 rand int 500 27.81 21.96 -21.0%
        random log <all> 5000000 rand int 1000 25.61 18.48 -27.8%
        Show
        Michael McCandless added a comment - OK results using last patch: JAVA: java version "1.6.0_14" Java(TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode) OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change wiki balanced 1 1169558 title 10 25.87 25.36 -2.0% wiki balanced 1 1169558 title 25 24.95 26.51 6.3% wiki balanced 1 1169558 title 50 25.27 25.28 0.0% wiki balanced 1 1169558 title 100 24.81 24.25 -2.3% wiki balanced 1 1169558 title 500 22.91 21.37 -6.7% wiki balanced 1 1169558 title 1000 21.45 17.73 -17.3% wiki balanced 2 1088162 title 10 27.45 26.97 -1.7% wiki balanced 2 1088162 title 25 26.85 26.88 0.1% wiki balanced 2 1088162 title 50 26.66 26.66 0.0% wiki balanced 2 1088162 title 100 26.45 26.27 -0.7% wiki balanced 2 1088162 title 500 23.53 21.99 -6.5% wiki balanced 2 1088162 title 1000 23.03 19.38 -15.8% wiki balanced <all> 5000000 title 10 26.75 27.41 2.5% wiki balanced <all> 5000000 title 25 26.18 27.74 6.0% wiki balanced <all> 5000000 title 50 24.28 27.06 11.4% wiki balanced <all> 5000000 title 100 20.82 26.96 29.5% wiki balanced <all> 5000000 title 500 19.71 22.41 13.7% wiki balanced <all> 5000000 title 1000 19.65 18.18 -7.5% random balanced <all> 5000000 rand string 10 25.75 27.66 7.4% random balanced <all> 5000000 rand string 25 24.32 27.59 13.4% random balanced <all> 5000000 rand string 50 21.42 27.19 26.9% random balanced <all> 5000000 rand string 100 19.30 26.66 38.1% random balanced <all> 5000000 rand string 500 17.80 22.00 23.6% random balanced <all> 5000000 rand string 1000 17.30 18.12 4.7% random balanced <all> 5000000 country 10 28.63 28.04 -2.1% random balanced <all> 5000000 country 25 25.02 27.57 10.2% random balanced <all> 5000000 country 50 20.50 26.81 30.8% random balanced <all> 5000000 country 100 19.95 26.34 32.0% random balanced <all> 5000000 country 500 18.43 22.19 20.4% random balanced <all> 5000000 country 1000 18.69 17.54 -6.2% random balanced <all> 5000000 rand int 10 28.39 29.46 3.8% random balanced <all> 5000000 rand int 25 26.97 27.73 2.8% random balanced <all> 5000000 rand int 50 29.76 27.87 -6.4% random balanced <all> 5000000 rand int 100 28.63 26.59 -7.1% random balanced <all> 5000000 rand int 500 27.18 21.67 -20.3% random balanced <all> 5000000 rand int 1000 25.79 16.92 -34.4% wiki log 1 1169569 title 10 23.98 23.32 -2.8% wiki log 1 1169569 title 25 25.55 26.53 3.8% wiki log 1 1169569 title 50 25.44 26.04 2.4% wiki log 1 1169569 title 100 24.43 25.92 6.1% wiki log 1 1169569 title 500 23.61 22.77 -3.6% wiki log 1 1169569 title 1000 21.88 20.48 -6.4% wiki log 2 1088167 title 10 28.03 27.48 -2.0% wiki log 2 1088167 title 25 27.20 27.57 1.4% wiki log 2 1088167 title 50 27.75 27.59 -0.6% wiki log 2 1088167 title 100 26.77 26.69 -0.3% wiki log 2 1088167 title 500 25.11 24.21 -3.6% wiki log 2 1088167 title 1000 23.27 21.75 -6.5% wiki log <all> 5000000 title 10 26.51 28.24 6.5% wiki log <all> 5000000 title 25 25.22 27.58 9.4% wiki log <all> 5000000 title 50 24.51 28.05 14.4% wiki log <all> 5000000 title 100 20.64 27.47 33.1% wiki log <all> 5000000 title 500 19.69 24.32 23.5% wiki log <all> 5000000 title 1000 19.30 20.56 6.5% random log <all> 5000000 rand string 10 25.57 26.94 5.4% random log <all> 5000000 rand string 25 25.56 28.12 10.0% random log <all> 5000000 rand string 50 21.33 27.63 29.5% random log <all> 5000000 rand string 100 18.96 27.31 44.0% random log <all> 5000000 rand string 500 18.74 23.72 26.6% random log <all> 5000000 rand string 1000 16.88 20.39 20.8% random log <all> 5000000 country 10 28.96 26.71 -7.8% random log <all> 5000000 country 25 24.91 27.41 10.0% random log <all> 5000000 country 50 19.81 27.19 37.3% random log <all> 5000000 country 100 20.80 26.80 28.8% random log <all> 5000000 country 500 18.90 23.63 25.0% random log <all> 5000000 country 1000 18.52 19.31 4.3% random log <all> 5000000 rand int 10 27.48 28.12 2.3% random log <all> 5000000 rand int 25 28.81 27.76 -3.6% random log <all> 5000000 rand int 50 29.60 28.41 -4.0% random log <all> 5000000 rand int 100 29.16 26.44 -9.3% random log <all> 5000000 rand int 500 27.81 21.96 -21.0% random log <all> 5000000 rand int 1000 25.61 18.48 -27.8%
        Hide
        Yonik Seeley added a comment -

        When I tried changing 5000000 to 4999999 I got exceptions... is this expected?

        RUN: balanced=balanced source=random query=*:* sort=sort_field:int nhits=10
          log: logs/singlePQ_balanced=20_numHits=10_query=*:*_sort=sort_field:int_source=random
        Traceback (most recent call last):
          File "sortBench.py", line 517, in <module>
            main()
          File "sortBench.py", line 374, in main
            run(mode, name)
          File "sortBench.py", line 486, in run
            singlePQ = r.runOne(s, 'singlePQ_%s' % prefix, INDEX_NUM_DOCS, query, verify=doVerify)
          File "sortBench.py", line 271, in runOne
            raise RuntimeError('indexNumDocs mismatch: expected %d but got %d' % (indexNumDocs, ndocs))
        RuntimeError: indexNumDocs mismatch: expected 4999999 but got 4999996
        
        Show
        Yonik Seeley added a comment - When I tried changing 5000000 to 4999999 I got exceptions... is this expected? RUN: balanced=balanced source=random query=*:* sort=sort_field: int nhits=10 log: logs/singlePQ_balanced=20_numHits=10_query=*:*_sort=sort_field:int_source=random Traceback (most recent call last): File "sortBench.py" , line 517, in <module> main() File "sortBench.py" , line 374, in main run(mode, name) File "sortBench.py" , line 486, in run singlePQ = r.runOne(s, 'singlePQ_%s' % prefix, INDEX_NUM_DOCS, query, verify=doVerify) File "sortBench.py" , line 271, in runOne raise RuntimeError('indexNumDocs mismatch: expected %d but got %d' % (indexNumDocs, ndocs)) RuntimeError: indexNumDocs mismatch: expected 4999999 but got 4999996
        Hide
        Mark Miller added a comment -

        I think thats just a limitation of the parallel indexing task? I've seen it not hit the target number exactly due to how it divides the docs between threads.

        Show
        Mark Miller added a comment - I think thats just a limitation of the parallel indexing task? I've seen it not hit the target number exactly due to how it divides the docs between threads.
        Hide
        Yonik Seeley added a comment -

        One thing that bothers me about multiPQ is the memory usage if you start paging deeper and have many segments. I've seen up to 100 segments in production systems. 100x the memory use isn't pretty.

        So another thought is... 2 queues instead of N queues?

        • search segment 1 into queue A
        • search segment 2 into queue B (with possible short circuiting by the smallest value in queueA)
        • queue B will have larger values than queue A on average, so merge queue A into queue B (unless B is much smaller?)
        • search segment 3 into queue A, short circuit by smallest in B, then merge B into A, etc
        Show
        Yonik Seeley added a comment - One thing that bothers me about multiPQ is the memory usage if you start paging deeper and have many segments. I've seen up to 100 segments in production systems. 100x the memory use isn't pretty. So another thought is... 2 queues instead of N queues? search segment 1 into queue A search segment 2 into queue B (with possible short circuiting by the smallest value in queueA) queue B will have larger values than queue A on average, so merge queue A into queue B (unless B is much smaller?) search segment 3 into queue A, short circuit by smallest in B, then merge B into A, etc
        Hide
        Uwe Schindler added a comment -

        Good catch!

        search segment 2 into queue B (with possible short circuiting by the smallest value in queueA)

        But this needs the new Comparator API's compareBottom (with which I have no problem! ). Nevertheless, I have still the opinion, that the new Collector API is much better (even it is more complicated). Maybe the implementation behind could be switchable and work different. Even with the new API, we could have more than one PQ.

        Show
        Uwe Schindler added a comment - Good catch! search segment 2 into queue B (with possible short circuiting by the smallest value in queueA) But this needs the new Comparator API's compareBottom (with which I have no problem! ). Nevertheless, I have still the opinion, that the new Collector API is much better (even it is more complicated). Maybe the implementation behind could be switchable and work different. Even with the new API, we could have more than one PQ.
        Hide
        Michael McCandless added a comment -

        I think thats just a limitation of the parallel indexing task? I've seen it not hit the target number exactly due to how it divides the docs between threads.

        Actually that's my bad – I divide the number by 4. So long as the number is 0 mod 4 it should work.

        Show
        Michael McCandless added a comment - I think thats just a limitation of the parallel indexing task? I've seen it not hit the target number exactly due to how it divides the docs between threads. Actually that's my bad – I divide the number by 4. So long as the number is 0 mod 4 it should work.
        Hide
        Jake Mannix added a comment -

        search segment 2 into queue B (with possible short circuiting by the smallest value in queueA)

        Well, we're not doing the short circuit trick on multiPQ right now, are we? It would certainly speed things up, but requires the API have the convert() method available, which was the big savings on the API side to multiPQ. If it was available, I think multiPQ (either with N or 2 queues) would perform strictly better than singlePQ, but I didn't suggest this because it seems to negate the cleanliness of the API.

        One thing John mentioned offhand is that perhaps the convert() method could be optional? If you don't implement it, you don't get to short-circuit using knowledge of previous segments, but if you do, you get maximum performance in the cases where multiPQ performs worse (mid-range hitCount, high numResultsToReturn, and in the numeric sorting case).

        I think maybe combining this idea with 2 queues could be the best of all worlds, with best overall speed, only twice the memory of singlePQ, and the simplest API with the addition of one new optional method?

        Show
        Jake Mannix added a comment - search segment 2 into queue B (with possible short circuiting by the smallest value in queueA) Well, we're not doing the short circuit trick on multiPQ right now, are we? It would certainly speed things up, but requires the API have the convert() method available, which was the big savings on the API side to multiPQ. If it was available, I think multiPQ (either with N or 2 queues) would perform strictly better than singlePQ, but I didn't suggest this because it seems to negate the cleanliness of the API. One thing John mentioned offhand is that perhaps the convert() method could be optional? If you don't implement it, you don't get to short-circuit using knowledge of previous segments, but if you do, you get maximum performance in the cases where multiPQ performs worse (mid-range hitCount, high numResultsToReturn, and in the numeric sorting case). I think maybe combining this idea with 2 queues could be the best of all worlds, with best overall speed, only twice the memory of singlePQ, and the simplest API with the addition of one new optional method?
        Hide
        Yonik Seeley added a comment -

        I personally think this is a ways from being resolved one way or another... we shouldn't rush it, and we also shouldn't just necessarily "revert" to the previous API. If we so end up switching away from FieldComparator, we should consider it a new change, and make it the best we can.

        the new Collector API is much better (even it is more complicated)

        The power is nice... and it does allow certain optimizations that the old one did not - such as caching a value that's not a trivial lookup by docid. But I think that when singlePQ does lose, it's perhaps due to the extra indirection overhead of FieldComparator... how else can one explain multiPQ sometimes being faster with integers?

        Show
        Yonik Seeley added a comment - I personally think this is a ways from being resolved one way or another... we shouldn't rush it, and we also shouldn't just necessarily "revert" to the previous API. If we so end up switching away from FieldComparator, we should consider it a new change, and make it the best we can. the new Collector API is much better (even it is more complicated) The power is nice... and it does allow certain optimizations that the old one did not - such as caching a value that's not a trivial lookup by docid. But I think that when singlePQ does lose, it's perhaps due to the extra indirection overhead of FieldComparator... how else can one explain multiPQ sometimes being faster with integers?
        Hide
        Mark Miller added a comment -

        Actually that's my bad - I divide the number by 4. So long as the number is 0 mod 4 it should work.

        Ah right - the parallel indexing task multiplies up (eg you give the docs per thread, not the total docs), it doesn't divide down - so that doesn't make sense. I was confusing with a different reporting oddity I've seen when using the parallel task - would have to investigate again to remember it correctly though.

        Show
        Mark Miller added a comment - Actually that's my bad - I divide the number by 4. So long as the number is 0 mod 4 it should work. Ah right - the parallel indexing task multiplies up (eg you give the docs per thread, not the total docs), it doesn't divide down - so that doesn't make sense. I was confusing with a different reporting oddity I've seen when using the parallel task - would have to investigate again to remember it correctly though.
        Hide
        Michael McCandless added a comment -

        how else can one explain multiPQ sometimes being faster with integers?

        Right, I think that's due to the higher constant overhead of single PQ. In my most recent run, multi PQ is a bit faster when sorting by int only when queue size is 10 or 25.

        Show
        Michael McCandless added a comment - how else can one explain multiPQ sometimes being faster with integers? Right, I think that's due to the higher constant overhead of single PQ. In my most recent run, multi PQ is a bit faster when sorting by int only when queue size is 10 or 25.
        Hide
        Yonik Seeley added a comment -

        So if we're considering new comparator APIs, and the indirection seems to be slowing things down... one thing to think about is how to eliminate that indirection.

        Even thinking about the multiPQ case - why should one need more than a single PQ when dealing with primitives that don't depend on context (i.e. everything except ord). If the comparator API had a way to set (or return) a primitive value for a single docid, and then those were compared (either directly by the PQ or via a callback), there wouldn't be an issue with reader transitions (because you don't compare id vs id) and hence no need for multiple priority queues. Avoiding the creation of intermediate Comparable objects also seems desirable.

        Perhaps do it how "score" is handled now... inlined into Entry? Should make heap rebalancing faster (fewer callbacks, fewer array lookups).

        Show
        Yonik Seeley added a comment - So if we're considering new comparator APIs, and the indirection seems to be slowing things down... one thing to think about is how to eliminate that indirection. Even thinking about the multiPQ case - why should one need more than a single PQ when dealing with primitives that don't depend on context (i.e. everything except ord). If the comparator API had a way to set (or return) a primitive value for a single docid, and then those were compared (either directly by the PQ or via a callback), there wouldn't be an issue with reader transitions (because you don't compare id vs id) and hence no need for multiple priority queues. Avoiding the creation of intermediate Comparable objects also seems desirable. Perhaps do it how "score" is handled now... inlined into Entry? Should make heap rebalancing faster (fewer callbacks, fewer array lookups).
        Hide
        Yonik Seeley added a comment - - edited

        Linux odin 2.6.28-16-generic #55-Ubuntu SMP Tue Oct 20 19:48:32 UTC 2009 x86_64 GNU/Linux
        Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
        java -Xms2048M -Xmx2048M -Xbatch -server
        Phenom II x4 3GHz (dynamic freq scaling turned off)

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        random balanced <all> 5000000 rand int 10 24.50 32.66 33.3%
        random balanced <all> 5000000 rand int 25 24.48 31.94 30.5%
        random balanced <all> 5000000 rand int 50 28.86 31.79 10.2%
        random balanced <all> 5000000 rand int 100 29.04 28.63 -1.4%
        random balanced <all> 5000000 rand int 500 28.08 23.21 -17.3%
        random balanced <all> 5000000 rand int 1000 25.20 19.04 -24.4%
        random balanced <all> 5000000 rand string 10 28.34 26.86 -5.2%
        random balanced <all> 5000000 rand string 25 26.24 26.60 1.4%
        random balanced <all> 5000000 rand string 50 18.33 25.94 41.5%
        random balanced <all> 5000000 rand string 100 17.99 25.77 43.2%
        random balanced <all> 5000000 rand string 500 17.24 21.80 26.5%
        random balanced <all> 5000000 rand string 1000 16.30 18.37 12.7%
        random balanced <all> 5000000 country 10 25.15 26.85 6.8%
        random balanced <all> 5000000 country 25 25.46 26.82 5.3%
        random balanced <all> 5000000 country 50 18.02 26.04 44.5%
        random balanced <all> 5000000 country 100 18.10 26.23 44.9%
        random balanced <all> 5000000 country 500 17.63 22.34 26.7%
        random balanced <all> 5000000 country 1000 17.33 18.46 6.5%
        random log <all> 5000000 rand int 10 24.60 32.69 32.9%
        random log <all> 5000000 rand int 25 29.35 32.85 11.9%
        random log <all> 5000000 rand int 50 29.25 32.23 10.2%
        random log <all> 5000000 rand int 100 29.26 28.87 -1.3%
        random log <all> 5000000 rand int 500 28.30 24.86 -12.2%
        random log <all> 5000000 rand int 1000 25.17 21.14 -16.0%
        random log <all> 5000000 rand string 10 25.27 26.96 6.7%
        random log <all> 5000000 rand string 25 26.32 26.95 2.4%
        random log <all> 5000000 rand string 50 18.28 26.23 43.5%
        random log <all> 5000000 rand string 100 18.06 26.23 45.2%
        random log <all> 5000000 rand string 500 17.40 22.79 31.0%
        random log <all> 5000000 rand string 1000 16.45 19.94 21.2%
        random log <all> 5000000 country 10 25.27 26.89 6.4%
        random log <all> 5000000 country 25 27.13 26.84 -1.1%
        random log <all> 5000000 country 50 26.50 26.17 -1.2%
        random log <all> 5000000 country 100 18.00 26.42 46.8%
        random log <all> 5000000 country 500 17.75 23.08 30.0%
        random log <all> 5000000 country 1000 17.41 20.25 16.3%

        Same setup, but no -Xbatch (I've found it slower sometimes):
        java -Xms2048M -Xmx2048M -server

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        random balanced <all> 5000000 rand int 10 27.83 22.38 -19.6%
        random balanced <all> 5000000 rand int 25 25.91 23.05 -11.0%
        random balanced <all> 5000000 rand int 50 28.72 22.78 -20.7%
        random balanced <all> 5000000 rand int 100 28.65 22.12 -22.8%
        random balanced <all> 5000000 rand int 500 28.21 18.96 -32.8%
        random balanced <all> 5000000 rand int 1000 25.73 16.18 -37.1%
        random balanced <all> 5000000 rand string 10 27.83 21.81 -21.6%
        random balanced <all> 5000000 rand string 25 23.30 22.30 -4.3%
        random balanced <all> 5000000 rand string 50 19.81 22.05 11.3%
        random balanced <all> 5000000 rand string 100 19.70 22.04 11.9%
        random balanced <all> 5000000 rand string 500 18.77 18.84 0.4%
        random balanced <all> 5000000 rand string 1000 17.76 16.21 -8.7%
        random balanced <all> 5000000 country 10 26.25 21.85 -16.8%
        random balanced <all> 5000000 country 25 26.36 22.30 -15.4%
        random balanced <all> 5000000 country 50 20.04 22.17 10.6%
        random balanced <all> 5000000 country 100 19.44 22.12 13.8%
        random balanced <all> 5000000 country 500 19.38 19.45 0.4%
        random balanced <all> 5000000 country 1000 19.00 16.41 -13.6%
        random log <all> 5000000 rand int 10 27.89 22.25 -20.2%
        random log <all> 5000000 rand int 25 29.01 23.16 -20.2%
        random log <all> 5000000 rand int 50 28.79 22.29 -22.6%
        random log <all> 5000000 rand int 100 29.51 22.21 -24.7%
        random log <all> 5000000 rand int 500 28.43 19.60 -31.1%
        random log <all> 5000000 rand int 1000 25.74 17.31 -32.8%
        random log <all> 5000000 rand string 10 27.64 22.05 -20.2%
        random log <all> 5000000 rand string 25 24.33 22.23 -8.6%
        random log <all> 5000000 rand string 50 19.86 22.31 12.3%
        random log <all> 5000000 rand string 100 19.67 21.99 11.8%
        random log <all> 5000000 rand string 500 19.07 19.65 3.0%
        random log <all> 5000000 rand string 1000 17.91 17.33 -3.2%
        random log <all> 5000000 country 10 26.54 22.31 -15.9%
        random log <all> 5000000 country 25 26.66 21.77 -18.3%
        random log <all> 5000000 country 50 26.65 22.33 -16.2%
        random log <all> 5000000 country 100 19.48 22.08 13.3%
        random log <all> 5000000 country 500 19.21 19.74 2.8%
        random log <all> 5000000 country 1000 18.81 17.70 -5.9%
        Show
        Yonik Seeley added a comment - - edited Linux odin 2.6.28-16-generic #55-Ubuntu SMP Tue Oct 20 19:48:32 UTC 2009 x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.6.0_14-b08) java -Xms2048M -Xmx2048M -Xbatch -server Phenom II x4 3GHz (dynamic freq scaling turned off) Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change random balanced <all> 5000000 rand int 10 24.50 32.66 33.3% random balanced <all> 5000000 rand int 25 24.48 31.94 30.5% random balanced <all> 5000000 rand int 50 28.86 31.79 10.2% random balanced <all> 5000000 rand int 100 29.04 28.63 -1.4% random balanced <all> 5000000 rand int 500 28.08 23.21 -17.3% random balanced <all> 5000000 rand int 1000 25.20 19.04 -24.4% random balanced <all> 5000000 rand string 10 28.34 26.86 -5.2% random balanced <all> 5000000 rand string 25 26.24 26.60 1.4% random balanced <all> 5000000 rand string 50 18.33 25.94 41.5% random balanced <all> 5000000 rand string 100 17.99 25.77 43.2% random balanced <all> 5000000 rand string 500 17.24 21.80 26.5% random balanced <all> 5000000 rand string 1000 16.30 18.37 12.7% random balanced <all> 5000000 country 10 25.15 26.85 6.8% random balanced <all> 5000000 country 25 25.46 26.82 5.3% random balanced <all> 5000000 country 50 18.02 26.04 44.5% random balanced <all> 5000000 country 100 18.10 26.23 44.9% random balanced <all> 5000000 country 500 17.63 22.34 26.7% random balanced <all> 5000000 country 1000 17.33 18.46 6.5% random log <all> 5000000 rand int 10 24.60 32.69 32.9% random log <all> 5000000 rand int 25 29.35 32.85 11.9% random log <all> 5000000 rand int 50 29.25 32.23 10.2% random log <all> 5000000 rand int 100 29.26 28.87 -1.3% random log <all> 5000000 rand int 500 28.30 24.86 -12.2% random log <all> 5000000 rand int 1000 25.17 21.14 -16.0% random log <all> 5000000 rand string 10 25.27 26.96 6.7% random log <all> 5000000 rand string 25 26.32 26.95 2.4% random log <all> 5000000 rand string 50 18.28 26.23 43.5% random log <all> 5000000 rand string 100 18.06 26.23 45.2% random log <all> 5000000 rand string 500 17.40 22.79 31.0% random log <all> 5000000 rand string 1000 16.45 19.94 21.2% random log <all> 5000000 country 10 25.27 26.89 6.4% random log <all> 5000000 country 25 27.13 26.84 -1.1% random log <all> 5000000 country 50 26.50 26.17 -1.2% random log <all> 5000000 country 100 18.00 26.42 46.8% random log <all> 5000000 country 500 17.75 23.08 30.0% random log <all> 5000000 country 1000 17.41 20.25 16.3% Same setup, but no -Xbatch (I've found it slower sometimes): java -Xms2048M -Xmx2048M -server Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change random balanced <all> 5000000 rand int 10 27.83 22.38 -19.6% random balanced <all> 5000000 rand int 25 25.91 23.05 -11.0% random balanced <all> 5000000 rand int 50 28.72 22.78 -20.7% random balanced <all> 5000000 rand int 100 28.65 22.12 -22.8% random balanced <all> 5000000 rand int 500 28.21 18.96 -32.8% random balanced <all> 5000000 rand int 1000 25.73 16.18 -37.1% random balanced <all> 5000000 rand string 10 27.83 21.81 -21.6% random balanced <all> 5000000 rand string 25 23.30 22.30 -4.3% random balanced <all> 5000000 rand string 50 19.81 22.05 11.3% random balanced <all> 5000000 rand string 100 19.70 22.04 11.9% random balanced <all> 5000000 rand string 500 18.77 18.84 0.4% random balanced <all> 5000000 rand string 1000 17.76 16.21 -8.7% random balanced <all> 5000000 country 10 26.25 21.85 -16.8% random balanced <all> 5000000 country 25 26.36 22.30 -15.4% random balanced <all> 5000000 country 50 20.04 22.17 10.6% random balanced <all> 5000000 country 100 19.44 22.12 13.8% random balanced <all> 5000000 country 500 19.38 19.45 0.4% random balanced <all> 5000000 country 1000 19.00 16.41 -13.6% random log <all> 5000000 rand int 10 27.89 22.25 -20.2% random log <all> 5000000 rand int 25 29.01 23.16 -20.2% random log <all> 5000000 rand int 50 28.79 22.29 -22.6% random log <all> 5000000 rand int 100 29.51 22.21 -24.7% random log <all> 5000000 rand int 500 28.43 19.60 -31.1% random log <all> 5000000 rand int 1000 25.74 17.31 -32.8% random log <all> 5000000 rand string 10 27.64 22.05 -20.2% random log <all> 5000000 rand string 25 24.33 22.23 -8.6% random log <all> 5000000 rand string 50 19.86 22.31 12.3% random log <all> 5000000 rand string 100 19.67 21.99 11.8% random log <all> 5000000 rand string 500 19.07 19.65 3.0% random log <all> 5000000 rand string 1000 17.91 17.33 -3.2% random log <all> 5000000 country 10 26.54 22.31 -15.9% random log <all> 5000000 country 25 26.66 21.77 -18.3% random log <all> 5000000 country 50 26.65 22.33 -16.2% random log <all> 5000000 country 100 19.48 22.08 13.3% random log <all> 5000000 country 500 19.21 19.74 2.8% random log <all> 5000000 country 1000 18.81 17.70 -5.9%
        Hide
        Yonik Seeley added a comment -

        If we do go with a multi-queue approach (anywhere from 2 to N), perhaps there are ways to optimize queue merging too. It looks like the current code fully pops all the source queues?

        If you're going to insert an element into a different queue, it's a waste to maintain heap order on the source queue, and it's less efficient to start with the smallest elements. Start with the leaves for the most effective short-circuiting. We could even optionally prune... if two children aren't competetive, neither will the parent be.

        Show
        Yonik Seeley added a comment - If we do go with a multi-queue approach (anywhere from 2 to N), perhaps there are ways to optimize queue merging too. It looks like the current code fully pops all the source queues? If you're going to insert an element into a different queue, it's a waste to maintain heap order on the source queue, and it's less efficient to start with the smallest elements. Start with the leaves for the most effective short-circuiting. We could even optionally prune... if two children aren't competetive, neither will the parent be.
        Hide
        Jake Mannix added a comment -

        That should speed things up, but that's way subleading in complexity. This is an additive term O(numSegments * numDesiredResults) total operations when done "slowly" (as opposed to the best merge, which is O(numDesiredResults * log(numSegments)) ), in comparison to the primary subleading piece for multiPQ, which is O(numSegments * numDesiredResults * log(numDesiredResults) * log(numHitsPerSegment) ), so that's taking a piece of the CPU time which is smaller by a factor of 20-100 already than the total PQ insert time, and reducing it by a further factor of maybe 5-10.

        If it's easy to code up, sure, why not. But it's not really "inner loop" necessary optimizations anymore, I'd argue.

        Show
        Jake Mannix added a comment - That should speed things up, but that's way subleading in complexity. This is an additive term O(numSegments * numDesiredResults) total operations when done "slowly" (as opposed to the best merge, which is O(numDesiredResults * log(numSegments)) ), in comparison to the primary subleading piece for multiPQ, which is O(numSegments * numDesiredResults * log(numDesiredResults) * log(numHitsPerSegment) ), so that's taking a piece of the CPU time which is smaller by a factor of 20-100 already than the total PQ insert time, and reducing it by a further factor of maybe 5-10. If it's easy to code up, sure, why not. But it's not really "inner loop" necessary optimizations anymore, I'd argue.
        Hide
        Yonik Seeley added a comment -

        The more I think about it though, the more I'd like to not simply return to compare(int doca, int docb). Getting the values for the documents to compare is not always a fast operation for custom comparators, so the ability to set/cache that value is a big win in those cases.

        Show
        Yonik Seeley added a comment - The more I think about it though, the more I'd like to not simply return to compare(int doca, int docb). Getting the values for the documents to compare is not always a fast operation for custom comparators, so the ability to set/cache that value is a big win in those cases.
        Hide
        Jake Mannix added a comment -

        But part of the point is that you don't have to get the values - you can have a fast in-memory structure which just encodes their sort-order, right? This is the whole point of using the ordinal - you pre-sort all of the possible values to get the ordinals, and now arbitrarily complex comparator reduces to int compare at sort time. In the custom comparators we use, for example, this allows for even sorting by multivalued fields in a custom way, via the simple compare(int doca, int docb) way.

        This reminds me: Mike, you switched the compare for ord values from being "return ordA - ordB" to being "return ordA > ordB ? 1 : (ordA == ordB ? 0 : -1)", on the basis of int overflow at some point, right? This is only true if we're really sorting by integers, which could overflow - if they're ordinals, then these are both non-negative numbers, and their difference will always be greater than -MAX_INT, so the branching can be avoided in this innermost comparison in this case.

        Show
        Jake Mannix added a comment - But part of the point is that you don't have to get the values - you can have a fast in-memory structure which just encodes their sort-order, right? This is the whole point of using the ordinal - you pre-sort all of the possible values to get the ordinals, and now arbitrarily complex comparator reduces to int compare at sort time . In the custom comparators we use, for example, this allows for even sorting by multivalued fields in a custom way, via the simple compare(int doca, int docb) way. This reminds me: Mike, you switched the compare for ord values from being "return ordA - ordB" to being "return ordA > ordB ? 1 : (ordA == ordB ? 0 : -1)", on the basis of int overflow at some point, right? This is only true if we're really sorting by integers, which could overflow - if they're ordinals, then these are both non-negative numbers, and their difference will always be greater than -MAX_INT, so the branching can be avoided in this innermost comparison in this case.
        Hide
        Yonik Seeley added a comment -

        But part of the point is that you don't have to get the values - you can have a fast in-memory structure which just encodes their sort-order, right?

        For certain types of comparators... but some custom comparators do more work and more indirect lookups (still fast enough to be a comparator, but certainly slower than just a direct array access). It would be nice to avoid doing it more than once per id, and this is where FieldComparator is superior.

        Show
        Yonik Seeley added a comment - But part of the point is that you don't have to get the values - you can have a fast in-memory structure which just encodes their sort-order, right? For certain types of comparators... but some custom comparators do more work and more indirect lookups (still fast enough to be a comparator, but certainly slower than just a direct array access). It would be nice to avoid doing it more than once per id, and this is where FieldComparator is superior.
        Hide
        Jake Mannix added a comment -

        Can you tell me more about this? What kind of comparator can't pre-create a fixed ordinal list for all the possible values? I'm sure I've seen this too, but I can't bring one to mind right now.

        Show
        Jake Mannix added a comment - Can you tell me more about this? What kind of comparator can't pre-create a fixed ordinal list for all the possible values? I'm sure I've seen this too, but I can't bring one to mind right now.
        Hide
        Yonik Seeley added a comment -

        What kind of comparator can't pre-create a fixed ordinal list for all the possible values?

        It's less about "can't" and more about there being too many disadvantages to pre-create in many cases.
        Sparse representations would fit in this category... the number of documents with a value is small, so you use a hash (like Solr's query elevation component).

        Solr's random sort comparator is another - it hashes directly from docid to get the sort value. Not slow per se, but it's certainly going to be faster to avoid recalculating it on every compare.

        Show
        Yonik Seeley added a comment - What kind of comparator can't pre-create a fixed ordinal list for all the possible values? It's less about "can't" and more about there being too many disadvantages to pre-create in many cases. Sparse representations would fit in this category... the number of documents with a value is small, so you use a hash (like Solr's query elevation component). Solr's random sort comparator is another - it hashes directly from docid to get the sort value. Not slow per se, but it's certainly going to be faster to avoid recalculating it on every compare.
        Hide
        Michael McCandless added a comment -

        This reminds me: Mike, you switched the compare for ord values from being "return ordA - ordB" to being "return ordA > ordB ? 1 : (ordA == ordB ? 0 : -1)", on the basis of int overflow at some point, right? This is only true if we're really sorting by integers, which could overflow - if they're ordinals, then these are both non-negative numbers, and their difference will always be greater than -MAX_INT, so the branching can be avoided in this innermost comparison in this case.

        Right, in the latest patch, for ords I just do the subtraction; for arbitrary ints, I do the if statement.

        Show
        Michael McCandless added a comment - This reminds me: Mike, you switched the compare for ord values from being "return ordA - ordB" to being "return ordA > ordB ? 1 : (ordA == ordB ? 0 : -1)", on the basis of int overflow at some point, right? This is only true if we're really sorting by integers, which could overflow - if they're ordinals, then these are both non-negative numbers, and their difference will always be greater than -MAX_INT, so the branching can be avoided in this innermost comparison in this case. Right, in the latest patch, for ords I just do the subtraction; for arbitrary ints, I do the if statement.
        Hide
        Marvin Humphrey added a comment -

        > What kind of comparator can't pre-create a fixed ordinal list for all the
        > possible values? I'm sure I've seen this too, but I can't bring one to mind
        > right now.

        I think the only time the ordinal list can't be created is when the source
        array contains some value that can't be compared against another value – e.g.
        some variant on NULL – or when the comparison function is broken, e.g. when
        a < b and b < c but c > a.

        For current KinoSearch and future Lucy, we pre-build the ord array at index
        time and mmap it at search time. (Thanks to mmap, sort caches have virtually
        no impact on IndexReader launch time.)

        Show
        Marvin Humphrey added a comment - > What kind of comparator can't pre-create a fixed ordinal list for all the > possible values? I'm sure I've seen this too, but I can't bring one to mind > right now. I think the only time the ordinal list can't be created is when the source array contains some value that can't be compared against another value – e.g. some variant on NULL – or when the comparison function is broken, e.g. when a < b and b < c but c > a. For current KinoSearch and future Lucy, we pre-build the ord array at index time and mmap it at search time. (Thanks to mmap, sort caches have virtually no impact on IndexReader launch time.)
        Hide
        Earwin Burrfoot added a comment -

        One thing that bothers me about multiPQ is the memory usage if you start paging deeper and have many segments. I've seen up to 100 segments in production systems. 100x the memory use isn't pretty.

        That's 100x the memory only for heaps, plus memory for Comparables - not nice.

        What kind of comparator can't pre-create a fixed ordinal list for all the possible values?

        Any comparator that has query-dependent ordering. Distance sort (of any kind, be it geo, or just any kind of value being close to your sample) for instance.

        I think the only time the ordinal list can't be created is when the source array contains some value that can't be compared against another value - e.g. some variant on NULL - or when the comparison function is broken, e.g. when a < b and b < c but c > a.

        With such comparison function you're busted anyway - the order of your hits is dependent on segment traversal order for instance. If you sharded your search - it depends on the order your shards responded to meta-search. Ugly.

        Show
        Earwin Burrfoot added a comment - One thing that bothers me about multiPQ is the memory usage if you start paging deeper and have many segments. I've seen up to 100 segments in production systems. 100x the memory use isn't pretty. That's 100x the memory only for heaps, plus memory for Comparables - not nice. What kind of comparator can't pre-create a fixed ordinal list for all the possible values? Any comparator that has query-dependent ordering. Distance sort (of any kind, be it geo, or just any kind of value being close to your sample) for instance. I think the only time the ordinal list can't be created is when the source array contains some value that can't be compared against another value - e.g. some variant on NULL - or when the comparison function is broken, e.g. when a < b and b < c but c > a. With such comparison function you're busted anyway - the order of your hits is dependent on segment traversal order for instance. If you sharded your search - it depends on the order your shards responded to meta-search. Ugly.
        Hide
        Michael McCandless added a comment -

        If the comparator API had a way to set (or return) a primitive value for a single docid, and then those were compared (either directly by the PQ or via a callback), there wouldn't be an issue with reader transitions (because you don't compare id vs id) and hence no need for multiple priority queues.

        I like this idea; I'll explore it. Also, your (Yonik's) results showed a very sizable gain for multi-PQ when sorting by int, which is surprising.

        Show
        Michael McCandless added a comment - If the comparator API had a way to set (or return) a primitive value for a single docid, and then those were compared (either directly by the PQ or via a callback), there wouldn't be an issue with reader transitions (because you don't compare id vs id) and hence no need for multiple priority queues. I like this idea; I'll explore it. Also, your (Yonik's) results showed a very sizable gain for multi-PQ when sorting by int, which is surprising.
        Hide
        Yonik Seeley added a comment -

        Also, your (Yonik's) results showed a very sizable gain for multi-PQ when sorting by int, which is surprising.

        Yeah... I can't help but wondering if we're measuring the edge (i.e. what won't be the bottleneck in typical searches). *:* is fast and doesn't access the index at all. Int sorting is also super fast in general, so we're down to mostly measuring the indirection time and overhead of the comparators... that's good when tweaking/optimizing, but you don't necessarily want to make bigger tradeoffs based on the fastest part of the system.

        Show
        Yonik Seeley added a comment - Also, your (Yonik's) results showed a very sizable gain for multi-PQ when sorting by int, which is surprising. Yeah... I can't help but wondering if we're measuring the edge (i.e. what won't be the bottleneck in typical searches). *:* is fast and doesn't access the index at all. Int sorting is also super fast in general, so we're down to mostly measuring the indirection time and overhead of the comparators... that's good when tweaking/optimizing, but you don't necessarily want to make bigger tradeoffs based on the fastest part of the system.
        Hide
        Michael McCandless added a comment -

        Agreed. Can you post your table when run on the term queries (eg '1')?

        Show
        Michael McCandless added a comment - Agreed. Can you post your table when run on the term queries (eg '1')?
        Hide
        Michael McCandless added a comment -

        Can you post your table when run on the term queries (eg '1')?

        Woops, scratch that. We don't have an int field on the wikipedia index. Hmm.

        Show
        Michael McCandless added a comment - Can you post your table when run on the term queries (eg '1')? Woops, scratch that. We don't have an int field on the wikipedia index. Hmm.
        Hide
        Yonik Seeley added a comment - - edited

        Here's some more mud to help clear the water This is with the latest JDK7 - tested twice to be sure, and all results were within .5 percentile points of eachother.

        Linux odin 2.6.28-16-generic #55-Ubuntu SMP Tue Oct 20 19:48:32 UTC 2009 x86_64 GNU/Linux
        Java(TM) SE Runtime Environment (build 1.7.0-ea-b74) (Oct 15 2009)
        java -Xms2048M -Xmx2048M -Xbatch -server
        Phenom II x4 3GHz (dynamic freq scaling turned off)

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        random balanced <all> 5000000 rand int 10 28.02 18.86 -32.7%
        random balanced <all> 5000000 rand int 25 27.93 18.80 -32.7%
        random balanced <all> 5000000 rand int 50 23.89 21.77 -8.9%
        random balanced <all> 5000000 rand int 100 23.74 21.21 -10.7%
        random balanced <all> 5000000 rand int 500 22.92 17.30 -24.5%
        random balanced <all> 5000000 rand int 1000 21.99 14.64 -33.4%
        random balanced <all> 5000000 rand string 10 23.63 20.58 -12.9%
        random balanced <all> 5000000 rand string 25 22.74 20.42 -10.2%
        random balanced <all> 5000000 rand string 50 16.88 21.93 29.9%
        random balanced <all> 5000000 rand string 100 19.32 21.42 10.9%
        random balanced <all> 5000000 rand string 500 18.58 18.14 -2.4%
        random balanced <all> 5000000 rand string 1000 18.08 15.25 -15.7%
        random balanced <all> 5000000 country 10 23.89 20.70 -13.4%
        random balanced <all> 5000000 country 25 22.59 20.58 -8.9%
        random balanced <all> 5000000 country 50 16.84 22.04 30.9%
        random balanced <all> 5000000 country 100 16.68 21.71 30.2%
        random balanced <all> 5000000 country 500 19.65 18.60 -5.3%
        random balanced <all> 5000000 country 1000 17.70 15.48 -12.5%
        random log <all> 5000000 rand int 10 28.31 18.94 -33.1%
        random log <all> 5000000 rand int 25 23.75 22.09 -7.0%
        random log <all> 5000000 rand int 50 23.99 21.90 -8.7%
        random log <all> 5000000 rand int 100 23.75 21.47 -9.6%
        random log <all> 5000000 rand int 500 22.83 18.41 -19.4%
        random log <all> 5000000 rand int 1000 21.99 15.96 -27.4%
        random log <all> 5000000 rand string 10 22.92 20.61 -10.1%
        random log <all> 5000000 rand string 25 23.36 22.27 -4.7%
        random log <all> 5000000 rand string 50 16.96 22.12 30.4%
        random log <all> 5000000 rand string 100 19.61 21.59 10.1%
        random log <all> 5000000 rand string 500 18.02 19.03 5.6%
        random log <all> 5000000 rand string 1000 18.54 16.51 -10.9%
        random log <all> 5000000 country 10 24.32 20.65 -15.1%
        random log <all> 5000000 country 25 23.46 20.72 -11.7%
        random log <all> 5000000 country 50 22.71 20.62 -9.2%
        random log <all> 5000000 country 100 16.78 21.78 29.8%
        random log <all> 5000000 country 500 19.14 19.22 0.4%
        random log <all> 5000000 country 1000 17.61 16.79 -4.7%

        Same setup, just w/o -Xbatch
        java -Xms2048M -Xmx2048M -server

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        random balanced <all> 5000000 rand int 10 28.63 24.16 -15.6%
        random balanced <all> 5000000 rand int 25 28.24 19.51 -30.9%
        random balanced <all> 5000000 rand int 50 29.24 19.21 -34.3%
        random balanced <all> 5000000 rand int 100 27.42 20.03 -27.0%
        random balanced <all> 5000000 rand int 500 26.38 16.82 -36.2%
        random balanced <all> 5000000 rand int 1000 26.34 14.40 -45.3%
        random balanced <all> 5000000 rand string 10 27.61 20.29 -26.5%
        random balanced <all> 5000000 rand string 25 25.84 21.80 -15.6%
        random balanced <all> 5000000 rand string 50 19.47 21.69 11.4%
        random balanced <all> 5000000 rand string 100 19.17 19.40 1.2%
        random balanced <all> 5000000 rand string 500 18.29 16.87 -7.8%
        random balanced <all> 5000000 rand string 1000 17.09 14.35 -16.0%
        random balanced <all> 5000000 country 10 22.48 21.42 -4.7%
        random balanced <all> 5000000 country 25 20.86 21.88 4.9%
        random balanced <all> 5000000 country 50 20.26 21.67 7.0%
        random balanced <all> 5000000 country 100 18.32 19.60 7.0%
        random balanced <all> 5000000 country 500 17.93 17.01 -5.1%
        random balanced <all> 5000000 country 1000 18.92 14.48 -23.5%
        random log <all> 5000000 rand int 10 28.71 24.35 -15.2%
        random log <all> 5000000 rand int 25 28.47 19.55 -31.3%
        random log <all> 5000000 rand int 50 28.19 19.38 -31.3%
        random log <all> 5000000 rand int 100 27.89 20.31 -27.2%
        random log <all> 5000000 rand int 500 25.13 17.64 -29.8%
        random log <all> 5000000 rand int 1000 26.51 15.55 -41.3%
        random log <all> 5000000 rand string 10 27.81 20.39 -26.7%
        random log <all> 5000000 rand string 25 25.66 21.96 -14.4%
        random log <all> 5000000 rand string 50 17.70 20.17 14.0%
        random log <all> 5000000 rand string 100 19.28 19.63 1.8%
        random log <all> 5000000 rand string 500 18.03 17.45 -3.2%
        random log <all> 5000000 rand string 1000 18.84 15.29 -18.8%
        random log <all> 5000000 country 10 22.58 21.47 -4.9%
        random log <all> 5000000 country 25 21.09 20.36 -3.5%
        random log <all> 5000000 country 50 21.03 21.80 3.7%
        random log <all> 5000000 country 100 18.45 21.38 15.9%
        random log <all> 5000000 country 500 17.89 17.69 -1.1%
        random log <all> 5000000 country 1000 18.93 15.62 -17.5%
        Show
        Yonik Seeley added a comment - - edited Here's some more mud to help clear the water This is with the latest JDK7 - tested twice to be sure, and all results were within .5 percentile points of eachother. Linux odin 2.6.28-16-generic #55-Ubuntu SMP Tue Oct 20 19:48:32 UTC 2009 x86_64 GNU/Linux Java(TM) SE Runtime Environment (build 1.7.0-ea-b74) (Oct 15 2009) java -Xms2048M -Xmx2048M -Xbatch -server Phenom II x4 3GHz (dynamic freq scaling turned off) Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change random balanced <all> 5000000 rand int 10 28.02 18.86 -32.7% random balanced <all> 5000000 rand int 25 27.93 18.80 -32.7% random balanced <all> 5000000 rand int 50 23.89 21.77 -8.9% random balanced <all> 5000000 rand int 100 23.74 21.21 -10.7% random balanced <all> 5000000 rand int 500 22.92 17.30 -24.5% random balanced <all> 5000000 rand int 1000 21.99 14.64 -33.4% random balanced <all> 5000000 rand string 10 23.63 20.58 -12.9% random balanced <all> 5000000 rand string 25 22.74 20.42 -10.2% random balanced <all> 5000000 rand string 50 16.88 21.93 29.9% random balanced <all> 5000000 rand string 100 19.32 21.42 10.9% random balanced <all> 5000000 rand string 500 18.58 18.14 -2.4% random balanced <all> 5000000 rand string 1000 18.08 15.25 -15.7% random balanced <all> 5000000 country 10 23.89 20.70 -13.4% random balanced <all> 5000000 country 25 22.59 20.58 -8.9% random balanced <all> 5000000 country 50 16.84 22.04 30.9% random balanced <all> 5000000 country 100 16.68 21.71 30.2% random balanced <all> 5000000 country 500 19.65 18.60 -5.3% random balanced <all> 5000000 country 1000 17.70 15.48 -12.5% random log <all> 5000000 rand int 10 28.31 18.94 -33.1% random log <all> 5000000 rand int 25 23.75 22.09 -7.0% random log <all> 5000000 rand int 50 23.99 21.90 -8.7% random log <all> 5000000 rand int 100 23.75 21.47 -9.6% random log <all> 5000000 rand int 500 22.83 18.41 -19.4% random log <all> 5000000 rand int 1000 21.99 15.96 -27.4% random log <all> 5000000 rand string 10 22.92 20.61 -10.1% random log <all> 5000000 rand string 25 23.36 22.27 -4.7% random log <all> 5000000 rand string 50 16.96 22.12 30.4% random log <all> 5000000 rand string 100 19.61 21.59 10.1% random log <all> 5000000 rand string 500 18.02 19.03 5.6% random log <all> 5000000 rand string 1000 18.54 16.51 -10.9% random log <all> 5000000 country 10 24.32 20.65 -15.1% random log <all> 5000000 country 25 23.46 20.72 -11.7% random log <all> 5000000 country 50 22.71 20.62 -9.2% random log <all> 5000000 country 100 16.78 21.78 29.8% random log <all> 5000000 country 500 19.14 19.22 0.4% random log <all> 5000000 country 1000 17.61 16.79 -4.7% Same setup, just w/o -Xbatch java -Xms2048M -Xmx2048M -server Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change random balanced <all> 5000000 rand int 10 28.63 24.16 -15.6% random balanced <all> 5000000 rand int 25 28.24 19.51 -30.9% random balanced <all> 5000000 rand int 50 29.24 19.21 -34.3% random balanced <all> 5000000 rand int 100 27.42 20.03 -27.0% random balanced <all> 5000000 rand int 500 26.38 16.82 -36.2% random balanced <all> 5000000 rand int 1000 26.34 14.40 -45.3% random balanced <all> 5000000 rand string 10 27.61 20.29 -26.5% random balanced <all> 5000000 rand string 25 25.84 21.80 -15.6% random balanced <all> 5000000 rand string 50 19.47 21.69 11.4% random balanced <all> 5000000 rand string 100 19.17 19.40 1.2% random balanced <all> 5000000 rand string 500 18.29 16.87 -7.8% random balanced <all> 5000000 rand string 1000 17.09 14.35 -16.0% random balanced <all> 5000000 country 10 22.48 21.42 -4.7% random balanced <all> 5000000 country 25 20.86 21.88 4.9% random balanced <all> 5000000 country 50 20.26 21.67 7.0% random balanced <all> 5000000 country 100 18.32 19.60 7.0% random balanced <all> 5000000 country 500 17.93 17.01 -5.1% random balanced <all> 5000000 country 1000 18.92 14.48 -23.5% random log <all> 5000000 rand int 10 28.71 24.35 -15.2% random log <all> 5000000 rand int 25 28.47 19.55 -31.3% random log <all> 5000000 rand int 50 28.19 19.38 -31.3% random log <all> 5000000 rand int 100 27.89 20.31 -27.2% random log <all> 5000000 rand int 500 25.13 17.64 -29.8% random log <all> 5000000 rand int 1000 26.51 15.55 -41.3% random log <all> 5000000 rand string 10 27.81 20.39 -26.7% random log <all> 5000000 rand string 25 25.66 21.96 -14.4% random log <all> 5000000 rand string 50 17.70 20.17 14.0% random log <all> 5000000 rand string 100 19.28 19.63 1.8% random log <all> 5000000 rand string 500 18.03 17.45 -3.2% random log <all> 5000000 rand string 1000 18.84 15.29 -18.8% random log <all> 5000000 country 10 22.58 21.47 -4.9% random log <all> 5000000 country 25 21.09 20.36 -3.5% random log <all> 5000000 country 50 21.03 21.80 3.7% random log <all> 5000000 country 100 18.45 21.38 15.9% random log <all> 5000000 country 500 17.89 17.69 -1.1% random log <all> 5000000 country 1000 18.93 15.62 -17.5%
        Hide
        Michael McCandless added a comment -
        • Added inlined single PQ approach, for sorting by int, so now
          sortBench.py runs trunk as baseline and inlined single PQ as
          "new". It now only does the int sort. The external API now looks
          like function queries DocValues (I added a simple
          IntDocValueSource/IntDocValues, that has one method to get the int
          for a given docID).
        • Added random int field when building wiki index; you'll have to
          remove existing wiki index. This is so we can test "more normal"
          queries, sorting by int.
        • Dropped number of threads during indexing to 1, and seeded the
          Random, so that we all produce the same index
        Show
        Michael McCandless added a comment - Added inlined single PQ approach, for sorting by int, so now sortBench.py runs trunk as baseline and inlined single PQ as "new". It now only does the int sort. The external API now looks like function queries DocValues (I added a simple IntDocValueSource/IntDocValues, that has one method to get the int for a given docID). Added random int field when building wiki index; you'll have to remove existing wiki index. This is so we can test "more normal" queries, sorting by int. Dropped number of threads during indexing to 1, and seeded the Random, so that we all produce the same index
        Hide
        Michael McCandless added a comment -

        JAVA:
        java version "1.6.0_14"
        Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
        Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

        OS:
        SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris

        Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change
        wiki log 1 1170209 rand int 10 27.08 27.52 1.6%
        wiki log 1 1170209 rand int 25 26.95 27.72 2.9%
        wiki log 1 1170209 rand int 50 27.54 27.03 -1.9%
        wiki log 1 1170209 rand int 100 27.08 24.19 -10.7%
        wiki log 1 1170209 rand int 500 26.13 26.88 2.9%
        wiki log 1 1170209 rand int 1000 24.30 25.91 6.6%
        wiki log 2 1088727 rand int 10 29.66 29.45 -0.7%
        wiki log 2 1088727 rand int 25 28.80 29.47 2.3%
        wiki log 2 1088727 rand int 50 29.57 30.21 2.2%
        wiki log 2 1088727 rand int 100 30.74 27.30 -11.2%
        wiki log 2 1088727 rand int 500 28.72 30.17 5.0%
        wiki log 2 1088727 rand int 1000 27.20 29.21 7.4%
        wiki log <all> 5000000 rand int 10 33.43 31.09 -7.0%
        wiki log <all> 5000000 rand int 25 33.24 31.77 -4.4%
        wiki log <all> 5000000 rand int 50 32.89 31.68 -3.7%
        wiki log <all> 5000000 rand int 100 32.04 21.33 -33.4%
        wiki log <all> 5000000 rand int 500 31.25 30.59 -2.1%
        wiki log <all> 5000000 rand int 1000 29.48 29.72 0.8%
        random log <all> 5000000 rand int 10 32.61 31.64 -3.0%
        random log <all> 5000000 rand int 25 32.98 31.79 -3.6%
        random log <all> 5000000 rand int 50 32.63 30.89 -5.3%
        random log <all> 5000000 rand int 100 32.08 21.60 -32.7%
        random log <all> 5000000 rand int 500 30.48 30.65 0.6%
        random log <all> 5000000 rand int 1000 28.95 29.44 1.7%

        This table compares trunk (= old) with the "inline int value directly into single PQ" approach (=new). So, a green result means the inline-single-PQ is faster; red means it's slower.

        The results baffle me. I would have expected for the 5M hits, with shallow topN, that the diffs would be minor since the sub-leading cost should be in the noise for either approach, and then as we go to fewer hits and deeper topN, that the inline-single-PQ approach would be faster. And, we still see strangeness at topN=100 where current trunk is always substantially better. The only difference between these ought to be the constant in front of the net number of inserstions. Strange!

        Show
        Michael McCandless added a comment - JAVA: java version "1.6.0_14" Java(TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode) OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris Source Seg size Query Tot hits Sort Top N QPS old QPS new Pct change wiki log 1 1170209 rand int 10 27.08 27.52 1.6% wiki log 1 1170209 rand int 25 26.95 27.72 2.9% wiki log 1 1170209 rand int 50 27.54 27.03 -1.9% wiki log 1 1170209 rand int 100 27.08 24.19 -10.7% wiki log 1 1170209 rand int 500 26.13 26.88 2.9% wiki log 1 1170209 rand int 1000 24.30 25.91 6.6% wiki log 2 1088727 rand int 10 29.66 29.45 -0.7% wiki log 2 1088727 rand int 25 28.80 29.47 2.3% wiki log 2 1088727 rand int 50 29.57 30.21 2.2% wiki log 2 1088727 rand int 100 30.74 27.30 -11.2% wiki log 2 1088727 rand int 500 28.72 30.17 5.0% wiki log 2 1088727 rand int 1000 27.20 29.21 7.4% wiki log <all> 5000000 rand int 10 33.43 31.09 -7.0% wiki log <all> 5000000 rand int 25 33.24 31.77 -4.4% wiki log <all> 5000000 rand int 50 32.89 31.68 -3.7% wiki log <all> 5000000 rand int 100 32.04 21.33 -33.4% wiki log <all> 5000000 rand int 500 31.25 30.59 -2.1% wiki log <all> 5000000 rand int 1000 29.48 29.72 0.8% random log <all> 5000000 rand int 10 32.61 31.64 -3.0% random log <all> 5000000 rand int 25 32.98 31.79 -3.6% random log <all> 5000000 rand int 50 32.63 30.89 -5.3% random log <all> 5000000 rand int 100 32.08 21.60 -32.7% random log <all> 5000000 rand int 500 30.48 30.65 0.6% random log <all> 5000000 rand int 1000 28.95 29.44 1.7% This table compares trunk (= old) with the "inline int value directly into single PQ" approach (=new). So, a green result means the inline-single-PQ is faster; red means it's slower. The results baffle me. I would have expected for the 5M hits, with shallow topN, that the diffs would be minor since the sub-leading cost should be in the noise for either approach, and then as we go to fewer hits and deeper topN, that the inline-single-PQ approach would be faster. And, we still see strangeness at topN=100 where current trunk is always substantially better. The only difference between these ought to be the constant in front of the net number of inserstions. Strange!
        Hide
        Michael McCandless added a comment -

        The number of insertions into the queue is miniscule for these tests.
        EG with topN=10, the query "1" against the 5M wikipedia index, causes
        110 insertions.

        Even at topN=1000 we see only 8053 insertions.

        So, the time difference of these runs is really driven by the "compare
        to bottom" check that's done for every hit.

        What baffles me is even if I take the inline-single-PQ from the last
        patch, and instead of invoking a separate class's
        "IntDocValues.intValue(doc)" I look it up straight from the int[] I
        get from FieldCache, I'm still seeing worse performance vs trunk.

        I think at this point this test is chasing java ghosts, so, we really
        can't conclude much.

        Also, I think, if you are sorting by native value per doc, likely the
        fastest way to take "bottom" into account is to push the check all the
        way down into the bottom TermScorers that're pulling docIDs from the
        posting lists. Ie, if your queue has converged, and you know a given
        doc must have value < 7 (say) to get into the queue, you can check
        each doc's value immediately on pulling it from the posting list and
        skip it if it won't compete (and, if you don't require exact total
        hits count).

        For queries that are otherwise costly, this can save alot of CPU.
        This is what the source code specialization (LUCENE-1594) does, and it
        results in excellent gains (if I'm remembering right!).

        Show
        Michael McCandless added a comment - The number of insertions into the queue is miniscule for these tests. EG with topN=10, the query "1" against the 5M wikipedia index, causes 110 insertions. Even at topN=1000 we see only 8053 insertions. So, the time difference of these runs is really driven by the "compare to bottom" check that's done for every hit. What baffles me is even if I take the inline-single-PQ from the last patch, and instead of invoking a separate class's "IntDocValues.intValue(doc)" I look it up straight from the int[] I get from FieldCache, I'm still seeing worse performance vs trunk. I think at this point this test is chasing java ghosts, so, we really can't conclude much. Also, I think, if you are sorting by native value per doc, likely the fastest way to take "bottom" into account is to push the check all the way down into the bottom TermScorers that're pulling docIDs from the posting lists. Ie, if your queue has converged, and you know a given doc must have value < 7 (say) to get into the queue, you can check each doc's value immediately on pulling it from the posting list and skip it if it won't compete (and, if you don't require exact total hits count). For queries that are otherwise costly, this can save alot of CPU. This is what the source code specialization ( LUCENE-1594 ) does, and it results in excellent gains (if I'm remembering right!).
        Hide
        John Wang added a comment -

        Hi Michael:

        Any plans/decisions on moving forward with multiQ within Lucene? I am planning on making the change locally for my project, but I would rather not duplicate the work if you are planning on doing this within lucene.

        Thanks

        -John

        Show
        John Wang added a comment - Hi Michael: Any plans/decisions on moving forward with multiQ within Lucene? I am planning on making the change locally for my project, but I would rather not duplicate the work if you are planning on doing this within lucene. Thanks -John
        Hide
        Michael McCandless added a comment -

        Hi John – it seems unlikely to happen any time too soon. We're still iterating here, and there are concerns (eg increased memory usage) with the multi PQ approach.

        Show
        Michael McCandless added a comment - Hi John – it seems unlikely to happen any time too soon. We're still iterating here, and there are concerns (eg increased memory usage) with the multi PQ approach.
        Hide
        Jake Mannix added a comment -

        The current concern is to do with the memory? I'm more concerned with the weird "java ghosts" that are flying around, sometimes swaying results by 20-40%... the memory could only be an issue on a setup with hundreds of segments and sorting the top 1000 values (do we really try to optimize for this performance case?). In the normal case (no more than tens of segments, and the top 10 or 100 hits), we're talking about what, 100-1000 PQ entries?

        Show
        Jake Mannix added a comment - The current concern is to do with the memory? I'm more concerned with the weird "java ghosts" that are flying around, sometimes swaying results by 20-40%... the memory could only be an issue on a setup with hundreds of segments and sorting the top 1000 values (do we really try to optimize for this performance case?). In the normal case (no more than tens of segments, and the top 10 or 100 hits), we're talking about what, 100-1000 PQ entries?
        Hide
        John Wang added a comment -

        Hi Michael:

        Thanks for the heads up. I will work on it locally then.

        I am a bit confused here with memory, since most users don't go beyond page one, I can't see memory is even a concern here comparing to the amount of memory lucene uses overall. Am I missing something?

        -John

        Show
        John Wang added a comment - Hi Michael: Thanks for the heads up. I will work on it locally then. I am a bit confused here with memory, since most users don't go beyond page one, I can't see memory is even a concern here comparing to the amount of memory lucene uses overall. Am I missing something? -John
        Hide
        John Wang added a comment -

        I just looked at the most recent patch. Every entry in the PQ is an extra int, so even at the very very very rare and extreme case, 100th page(assuming 10 entries per page) and 100 segment index, we are looking at 400k.
        Is this really a concern? I must be missing something...

        -John

        Show
        John Wang added a comment - I just looked at the most recent patch. Every entry in the PQ is an extra int, so even at the very very very rare and extreme case, 100th page(assuming 10 entries per page) and 100 segment index, we are looking at 400k. Is this really a concern? I must be missing something... -John
        Hide
        Mark Miller added a comment -

        Its not as rare as you think - certainly not 3 verys rare.

        Show
        Mark Miller added a comment - Its not as rare as you think - certainly not 3 verys rare.
        Hide
        Earwin Burrfoot added a comment -

        Regarding memory - If I'm not misunderstanding things, you also have to materialize all field values you're sorting on for each doc in each PQ?

        Show
        Earwin Burrfoot added a comment - Regarding memory - If I'm not misunderstanding things, you also have to materialize all field values you're sorting on for each doc in each PQ?
        Hide
        Mark Miller added a comment -

        Yes, but you have to do that with the current method as well. Comes with per segment really.

        Show
        Mark Miller added a comment - Yes, but you have to do that with the current method as well. Comes with per segment really.
        Hide
        Earwin Burrfoot added a comment -

        Right now DocComparator cheats and stores fields striped, keeping primitives(or in my case any value that can be represented with primitive) - primitive. If the same approach is taken with multiPQs, BOOM! - there goes your API simplicity.

        Show
        Earwin Burrfoot added a comment - Right now DocComparator cheats and stores fields striped, keeping primitives(or in my case any value that can be represented with primitive) - primitive. If the same approach is taken with multiPQs, BOOM! - there goes your API simplicity.
        Hide
        John Wang added a comment -

        Mark:

        100th page at the same time index is at 100 segments? How many very's would you give it?

        Earwin:

        Field values are in FieldCache. Not in the PQ. It is the PQ's memory consumption is at question here (If I am not misunderstanding) You only materialize after the merge, which even at the N*Very case, it is only a page worth, which is the same as the singlePQ approach.

        -John

        Show
        John Wang added a comment - Mark: 100th page at the same time index is at 100 segments? How many very's would you give it? Earwin: Field values are in FieldCache. Not in the PQ. It is the PQ's memory consumption is at question here (If I am not misunderstanding) You only materialize after the merge, which even at the N*Very case, it is only a page worth, which is the same as the singlePQ approach. -John
        Hide
        Mark Miller added a comment - - edited

        100th page at the same time index is at 100 segments? How many very's would you give it?

        I'm not claiming 100th page with many segments - I have no info on that, and I agree it would be more rare. But it has come to my attention that 100th page is more common than I would have thought. (sorry - I wasn't very clear on that in my last comment - I am just referring to the deep paging - I previously would have thought its more rare than I do now - though even before, its something I wouldnt want to see a huge perf drop on)

        In any case - no one is saying this change won't happen. Just that its not likely to happen soon.

        edit

        Let me answer the question though - based on my experience with the mergefactors people like to use, and the cost of optimizing, I would say 100 segments deserves no very. At best, it might be semi rare. Mixed with the 100 page req, I'd take it to rare. But thats just me guessing based on my Lucene/Solr experience - so its not worth a whole ton.

        Show
        Mark Miller added a comment - - edited 100th page at the same time index is at 100 segments? How many very's would you give it? I'm not claiming 100th page with many segments - I have no info on that, and I agree it would be more rare. But it has come to my attention that 100th page is more common than I would have thought. (sorry - I wasn't very clear on that in my last comment - I am just referring to the deep paging - I previously would have thought its more rare than I do now - though even before, its something I wouldnt want to see a huge perf drop on) In any case - no one is saying this change won't happen. Just that its not likely to happen soon. edit Let me answer the question though - based on my experience with the mergefactors people like to use, and the cost of optimizing, I would say 100 segments deserves no very. At best, it might be semi rare. Mixed with the 100 page req, I'd take it to rare. But thats just me guessing based on my Lucene/Solr experience - so its not worth a whole ton.
        Hide
        John Wang added a comment -

        Mark:

        The point of discussion is memory, unless a few hundred K of memory consumption implies a "huge perf drop". (I see you are being conservative and using only 1 huge )

        Even with 100 segment which I am guessing you agree that it is rare, it is 400K, (in this discussion, I am using it as an upper bound, perhaps I should state it more explicitly) and thus my inability to understand that being a memory concern.

        BTW, I am interested the percentage of "deep paging" you are seeing. You argue it is not rare, do you have some concrete numbers? The stats I have seen from our production logs and also web search logs when I was working on that, the percentage is very very very very very (5 very's) low. (sharp drop usually is at page 4, let alone page 100)

        -John

        Show
        John Wang added a comment - Mark: The point of discussion is memory, unless a few hundred K of memory consumption implies a "huge perf drop". (I see you are being conservative and using only 1 huge ) Even with 100 segment which I am guessing you agree that it is rare, it is 400K, (in this discussion, I am using it as an upper bound, perhaps I should state it more explicitly) and thus my inability to understand that being a memory concern. BTW, I am interested the percentage of "deep paging" you are seeing. You argue it is not rare, do you have some concrete numbers? The stats I have seen from our production logs and also web search logs when I was working on that, the percentage is very very very very very (5 very's) low. (sharp drop usually is at page 4, let alone page 100) -John
        Hide
        Mark Miller added a comment -

        The point of discussion is memory, unless a few hundred K of memory consumption implies a "huge perf drop". (I see you are being conservative and using only 1 huge )

        I know, I was purposely avoiding getting into the mem argument and just focusing on how rare the situation is. And whether there is going to be a huge perf drop with queue sizes of 1000, I just don't know. The tests have been changing a lot - which is why I think its a little early to come to final conclusions.

        Even with 100 segment which I am guessing you agree that it is rare, it is 400K, (in this discussion, I am using it as an upper bound, perhaps I should state it more explicitly) and thus my inability to understand that being a memory concern.

        Yes - I do agree its rare.

        BTW, I am interested the percentage of "deep paging" you are seeing. You argue it is not rare, do you have some concrete numbers? The stats I have seen from our production logs and also web search logs when I was working on that, the percentage is very very very very very (5 very's) low. (sharp drop usually is at page 4, let alone page 100)

        I don't have numbers I can share - but this isn't for situations with users paging through an interface (like a web search page) - its users that are using Lucene for other tasks - and there are plenty of those. Lucene is used a lot for websites with users click through 10 results at a time - but its also used in many, many other apps (and I do mean two manys ! )

        Show
        Mark Miller added a comment - The point of discussion is memory, unless a few hundred K of memory consumption implies a "huge perf drop". (I see you are being conservative and using only 1 huge ) I know, I was purposely avoiding getting into the mem argument and just focusing on how rare the situation is. And whether there is going to be a huge perf drop with queue sizes of 1000, I just don't know. The tests have been changing a lot - which is why I think its a little early to come to final conclusions. Even with 100 segment which I am guessing you agree that it is rare, it is 400K, (in this discussion, I am using it as an upper bound, perhaps I should state it more explicitly) and thus my inability to understand that being a memory concern. Yes - I do agree its rare. BTW, I am interested the percentage of "deep paging" you are seeing. You argue it is not rare, do you have some concrete numbers? The stats I have seen from our production logs and also web search logs when I was working on that, the percentage is very very very very very (5 very's) low. (sharp drop usually is at page 4, let alone page 100) I don't have numbers I can share - but this isn't for situations with users paging through an interface (like a web search page) - its users that are using Lucene for other tasks - and there are plenty of those. Lucene is used a lot for websites with users click through 10 results at a time - but its also used in many, many other apps (and I do mean two manys ! )
        Hide
        Mark Miller added a comment - - edited

        Actually - while I cannot share any current info I have, I'll share an example from my last job. I worked on a system that librarians used to maintain a newspaper archive. The feed for the paper would come in daily and the librarians would "enhance" the data - adding keywords, breaking up stories, etc. Then reporters or end users could search this data. Librarians, who I learned are odd in there requirements by nature, insisted on bringing in thousands of results that they could scroll through at a time. This was demanded at paper after paper. So we regularly fed back up to 5000 thousand results at a time with our software (though they'd have preferred no limit - "what are you talking about ! I want them all!" - we made them click more buttons for that ). Thats just one small example, but I know for a fact there are many, many more.

        edit

        We also actually ran into many situations were there were lots of segments in this scenario as well - before I knew better, I'd regularly build the indexes with a high merge factor for speed - and then be stuck, unable to optimize because it killed performance and newspapers need to be up pretty much 24/7 - without bringing there server to a crawl - so I would be unable to optimize (this was before you could optimize down to n segments and work slowly over time). Not the greatest example, but a situation I found myself in.

        Show
        Mark Miller added a comment - - edited Actually - while I cannot share any current info I have, I'll share an example from my last job. I worked on a system that librarians used to maintain a newspaper archive. The feed for the paper would come in daily and the librarians would "enhance" the data - adding keywords, breaking up stories, etc. Then reporters or end users could search this data. Librarians, who I learned are odd in there requirements by nature, insisted on bringing in thousands of results that they could scroll through at a time. This was demanded at paper after paper. So we regularly fed back up to 5000 thousand results at a time with our software (though they'd have preferred no limit - "what are you talking about ! I want them all!" - we made them click more buttons for that ). Thats just one small example, but I know for a fact there are many, many more. edit We also actually ran into many situations were there were lots of segments in this scenario as well - before I knew better, I'd regularly build the indexes with a high merge factor for speed - and then be stuck, unable to optimize because it killed performance and newspapers need to be up pretty much 24/7 - without bringing there server to a crawl - so I would be unable to optimize (this was before you could optimize down to n segments and work slowly over time). Not the greatest example, but a situation I found myself in.
        Hide
        John Wang added a comment -

        Another observation, with multiQ approach, seems there would be no need for the set of OutOfOrder*Comparators. Because we would know each Q corresponds to 1 segment, and would not matter if docs come out of order.
        Please correct me if I am misunderstanding here.

        Show
        John Wang added a comment - Another observation, with multiQ approach, seems there would be no need for the set of OutOfOrder*Comparators. Because we would know each Q corresponds to 1 segment, and would not matter if docs come out of order. Please correct me if I am misunderstanding here.
        Hide
        Earwin Burrfoot added a comment -

        though they'd have preferred no limit - "what are you talking about ! I want them all!"

        Same here. People searching on a job site don't really care for top 10 vacancies/resumes, they want eeeeeverything! that matched their requirements.

        John: on the other hand, we here already have code to merge results coming from different shards, with stripes, primitives and whistles to appease GC. Might as well reuse it asis to merge between segments.

        Show
        Earwin Burrfoot added a comment - though they'd have preferred no limit - "what are you talking about ! I want them all!" Same here. People searching on a job site don't really care for top 10 vacancies/resumes, they want eeeeeverything! that matched their requirements. John: on the other hand, we here already have code to merge results coming from different shards, with stripes, primitives and whistles to appease GC. Might as well reuse it asis to merge between segments.
        Hide
        Michael McCandless added a comment -

        Another observation, with multiQ approach, seems there would be no need for the set of OutOfOrder*Comparators.

        I think it'd still be beneficial to differentiate In vs OutOf order collectors, because even within 1 segment if you know the docIDs arrive in order then the compare bottom is cheaper (need not break ties).

        Even with 100 segment which I am guessing you agree that it is rare, it is 400K,

        Don't forget that this is multiplied by however many queries are currently in flight.

        Yonik also raised an important difference of the single PQ API (above: https://issues.apache.org/jira/browse/LUCENE-1997?focusedCommentId=12771008&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12771008), ie, the fact that it references "slots" instead of "docid" means you can cache something private in your "slots" based on previous compare/copy.

        Since each approach has distinct advantages, why not offer both ("simple" and "expert") comparator extensions APIs?

        Show
        Michael McCandless added a comment - Another observation, with multiQ approach, seems there would be no need for the set of OutOfOrder*Comparators. I think it'd still be beneficial to differentiate In vs OutOf order collectors, because even within 1 segment if you know the docIDs arrive in order then the compare bottom is cheaper (need not break ties). Even with 100 segment which I am guessing you agree that it is rare, it is 400K, Don't forget that this is multiplied by however many queries are currently in flight. Yonik also raised an important difference of the single PQ API (above: https://issues.apache.org/jira/browse/LUCENE-1997?focusedCommentId=12771008&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12771008 ), ie, the fact that it references "slots" instead of "docid" means you can cache something private in your "slots" based on previous compare/copy. Since each approach has distinct advantages, why not offer both ("simple" and "expert") comparator extensions APIs?
        Hide
        Jake Mannix added a comment -

        Since each approach has distinct advantages, why not offer both ("simple" and "expert") comparator extensions APIs?

        +1 from me on this one, as long as the simpler one is around. I'll bet we'll find that we regret keeping the "expert" one by 3.2 or so though, but I'll take any compromise which gets the simpler API in there.

        Don't forget that this is multiplied by however many queries are currently in flight.

        Sure, so if you're running with 100 queries per second on a single shard (pretty fast!), with 100 segments, and you want to do sorting by value on the top 1000 values (how far down the long tail of extreme cases are we at now? Do librarians hit their search servers with 100 QPS and have indices poorly built with hundreds of segments and can't take downtime to ever optimize?), we're now talking about 40MB.

        Forty megabytes. On a beefy machine which is supposed to be handling 100QPS across an index big enough to need 100 segments. How much heap would such a machine already be allocating? 4GB? 6? More?

        We're talking about less than 1% of the heap is being used by the multiPQ approach in comparison to singlePQ.

        Show
        Jake Mannix added a comment - Since each approach has distinct advantages, why not offer both ("simple" and "expert") comparator extensions APIs? +1 from me on this one, as long as the simpler one is around. I'll bet we'll find that we regret keeping the "expert" one by 3.2 or so though, but I'll take any compromise which gets the simpler API in there. Don't forget that this is multiplied by however many queries are currently in flight. Sure, so if you're running with 100 queries per second on a single shard (pretty fast!), with 100 segments, and you want to do sorting by value on the top 1000 values (how far down the long tail of extreme cases are we at now? Do librarians hit their search servers with 100 QPS and have indices poorly built with hundreds of segments and can't take downtime to ever optimize?), we're now talking about 40MB. Forty megabytes . On a beefy machine which is supposed to be handling 100QPS across an index big enough to need 100 segments. How much heap would such a machine already be allocating? 4GB? 6? More? We're talking about less than 1% of the heap is being used by the multiPQ approach in comparison to singlePQ.
        Hide
        Mark Miller added a comment -

        If this is about ease of use, its pretty easy to return Comparable to the fieldcache, add a Comparable fieldcomparator, and let users that need it (havn't seen the clamoring yet though) just implement getComparable(String). Its not that hard to support that with a single queue either.

        Its still my opinion that users that need this are advanced enough to look at the provided impls and figure it out pretty quick. Its not rocket science, and each method impls is generally a couple simple lines of code.

        Show
        Mark Miller added a comment - If this is about ease of use, its pretty easy to return Comparable to the fieldcache, add a Comparable fieldcomparator, and let users that need it (havn't seen the clamoring yet though) just implement getComparable(String). Its not that hard to support that with a single queue either. Its still my opinion that users that need this are advanced enough to look at the provided impls and figure it out pretty quick. Its not rocket science, and each method impls is generally a couple simple lines of code.
        Hide
        Erick Erickson added a comment -

        2013 Old JIRA cleanup

        Show
        Erick Erickson added a comment - 2013 Old JIRA cleanup

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development