Lucene - Core
  1. Lucene - Core
  2. LUCENE-4227

DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This postings format just wraps Lucene40 (on disk) but then at search
      time it loads (up front) all terms postings into RAM.

      You'd use this if you have insane amounts of RAM and want the fastest
      possible search performance. The postings are not compressed: docIds,
      positions are stored as straight int[]s.

      The terms are stored as a skip list (array of byte[]), but I packed
      all terms together into a single long byte[]: I had started as actual
      separate byte[] per term but the added pointer deref and loss of
      locality was a lot (~2X) slower for terms-dict intensive queries like
      FuzzyQuery.

      Low frequency postings (docFreq <= 32 by default) store all docs, pos
      and offsets into a single int[]. High frequency postings store docs
      as int[], freqs as int[], and positions as int[][] parallel arrays.
      For skipping I just do a growing binary search.

      I also made specialized DirectTermScorer and DirectExactPhraseScorer
      for the high freq case that just pull the int[] and iterate
      themselves.

      All tests pass.

      1. LUCENE-4227.patch
        81 kB
        Michael McCandless
      2. LUCENE-4227.patch
        74 kB
        Michael McCandless
      3. LUCENE-4227.patch
        91 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch.

        Show
        Michael McCandless added a comment - Patch.
        Hide
        Michael McCandless added a comment -

        I ran perf tests on a 2M Wikipedia index (requires 8 GB heap: need
        more RAM to go higher!).

        Results without the specialized scorers (baseline is trunk w/ MMapDir):

                        Task    QPS base StdDev base  QPS directStdDev direct      Pct diff
                    PKLookup      259.28       11.94      227.96        5.85  -18% -   -5%
                      Fuzzy1      160.21        5.11      183.91        1.48   10% -   19%
                 TermGroup1M       18.33        0.21       21.60        0.11   15% -   19%
                    SpanNear        5.79        0.16        6.86        0.31   10% -   27%
                TermBGroup1M       18.46        0.24       22.16        0.11   17% -   22%
              TermBGroup1M1P       22.47        0.65       28.04        0.67   18% -   31%
                SloppyPhrase        3.51        0.13        4.60        0.05   24% -   37%
                      IntNRQ       53.75        4.68       71.22        4.21   14% -   53%
                  OrHighHigh       18.85        0.42       26.89        2.16   28% -   57%
                   OrHighMed       37.93        0.91       54.57        5.71   25% -   62%
                     Respell      167.73        5.37      242.93        1.78   39% -   50%
                    Wildcard       46.64        1.74       69.98        3.43   37% -   63%
                     Prefix3      109.51        3.45      165.77        6.42   41% -   62%
                      Fuzzy2       56.48        2.37       88.25        0.91   48% -   64%
                 AndHighHigh       24.59        0.74       41.82        0.72   62% -   78%
                      Phrase       12.57        0.20       21.89        0.71   65% -   82%
                        Term       39.05        1.74       69.00        3.68   60% -   94%
                  AndHighMed      126.87        2.48      261.73        4.19   99% -  113%
        

        Nice speedups!

        Same run, but using trunk w/ RAMDirectory as the baseline:

                        Task    QPS base StdDev base  QPS directStdDev direct      Pct diff
                    PKLookup      248.50        4.73      222.03        4.43  -14% -   -7%
                      Fuzzy1      159.41        3.65      185.32        3.15   11% -   21%
                    SpanNear        5.74        0.08        6.75        0.17   13% -   22%
                 TermGroup1M       17.78        0.42       21.03        0.68   11% -   25%
                TermBGroup1M       19.32        0.58       23.08        1.02   10% -   28%
                      IntNRQ       46.82        0.49       56.12        1.28   15% -   23%
              TermBGroup1M1P       23.27        0.46       30.14        0.91   23% -   36%
                     Respell      163.36        3.42      221.10        2.48   31% -   39%
                   OrHighMed       30.62        1.94       42.94        5.70   14% -   69%
                  OrHighHigh       17.98        0.99       25.69        3.35   17% -   70%
                     Prefix3      114.41        0.67      164.19        2.22   40% -   46%
                    Wildcard       47.58        0.36       70.47        1.20   44% -   51%
                      Fuzzy2       53.92        1.37       83.54        2.66   46% -   64%
                SloppyPhrase        5.07        0.23        8.12        0.74   39% -   82%
                 AndHighHigh       24.73        0.75       40.51        0.42   57% -   70%
                      Phrase       14.02        0.07       23.42        0.30   64% -   69%
                        Term       39.96        2.13       67.39        4.09   50% -   88%
                  AndHighMed      132.66        3.24      274.07        1.64  100% -  113%
        

        Still good speedups over the "obvious hold index in RAM" option.

        Then, just testing the specialized scorers (baseline = DirectPF without
        specialized scorers):

                        Task    QPS base StdDev base  QPS directStdDev direct      Pct diff
                      IntNRQ       74.86        3.42       71.72        0.27   -8% -    0%
                    Wildcard       62.88        2.34       60.52        0.49   -7% -    0%
                     Prefix3      102.46        3.98       98.92        0.85   -7% -    1%
                 AndHighHigh       51.41        1.96       50.26        1.10   -7% -    3%
                  AndHighMed      238.18        5.17      234.14        2.83   -4% -    1%
                      Fuzzy1      179.64        1.73      177.96        3.27   -3% -    1%
                SloppyPhrase        8.97        0.37        8.93        0.48   -9% -    9%
                     Respell      223.76        1.16      222.79        2.68   -2% -    1%
                      Fuzzy2       79.62        1.38       79.31        0.90   -3% -    2%
                    SpanNear        6.83        0.25        6.89        0.31   -7% -    9%
                    PKLookup      220.25        1.46      225.17        2.56    0% -    4%
                   OrHighMed       50.70        4.27       53.20        3.95  -10% -   23%
                 TermGroup1M       22.17        0.33       23.42        0.37    2% -    8%
                TermBGroup1M       24.45        0.44       25.86        0.21    3% -    8%
              TermBGroup1M1P       30.61        0.82       32.76        0.12    3% -   10%
                        Term       68.69        3.99       74.88        0.28    2% -   16%
                  OrHighHigh       26.61        1.95       29.07        2.24   -6% -   26%
                      Phrase       13.81        0.17       15.96        0.13   13% -   17%
        

        Reasonable but not immense speedups by specializing query
        scorers.

        Show
        Michael McCandless added a comment - I ran perf tests on a 2M Wikipedia index (requires 8 GB heap: need more RAM to go higher!). Results without the specialized scorers (baseline is trunk w/ MMapDir): Task QPS base StdDev base QPS directStdDev direct Pct diff PKLookup 259.28 11.94 227.96 5.85 -18% - -5% Fuzzy1 160.21 5.11 183.91 1.48 10% - 19% TermGroup1M 18.33 0.21 21.60 0.11 15% - 19% SpanNear 5.79 0.16 6.86 0.31 10% - 27% TermBGroup1M 18.46 0.24 22.16 0.11 17% - 22% TermBGroup1M1P 22.47 0.65 28.04 0.67 18% - 31% SloppyPhrase 3.51 0.13 4.60 0.05 24% - 37% IntNRQ 53.75 4.68 71.22 4.21 14% - 53% OrHighHigh 18.85 0.42 26.89 2.16 28% - 57% OrHighMed 37.93 0.91 54.57 5.71 25% - 62% Respell 167.73 5.37 242.93 1.78 39% - 50% Wildcard 46.64 1.74 69.98 3.43 37% - 63% Prefix3 109.51 3.45 165.77 6.42 41% - 62% Fuzzy2 56.48 2.37 88.25 0.91 48% - 64% AndHighHigh 24.59 0.74 41.82 0.72 62% - 78% Phrase 12.57 0.20 21.89 0.71 65% - 82% Term 39.05 1.74 69.00 3.68 60% - 94% AndHighMed 126.87 2.48 261.73 4.19 99% - 113% Nice speedups! Same run, but using trunk w/ RAMDirectory as the baseline: Task QPS base StdDev base QPS directStdDev direct Pct diff PKLookup 248.50 4.73 222.03 4.43 -14% - -7% Fuzzy1 159.41 3.65 185.32 3.15 11% - 21% SpanNear 5.74 0.08 6.75 0.17 13% - 22% TermGroup1M 17.78 0.42 21.03 0.68 11% - 25% TermBGroup1M 19.32 0.58 23.08 1.02 10% - 28% IntNRQ 46.82 0.49 56.12 1.28 15% - 23% TermBGroup1M1P 23.27 0.46 30.14 0.91 23% - 36% Respell 163.36 3.42 221.10 2.48 31% - 39% OrHighMed 30.62 1.94 42.94 5.70 14% - 69% OrHighHigh 17.98 0.99 25.69 3.35 17% - 70% Prefix3 114.41 0.67 164.19 2.22 40% - 46% Wildcard 47.58 0.36 70.47 1.20 44% - 51% Fuzzy2 53.92 1.37 83.54 2.66 46% - 64% SloppyPhrase 5.07 0.23 8.12 0.74 39% - 82% AndHighHigh 24.73 0.75 40.51 0.42 57% - 70% Phrase 14.02 0.07 23.42 0.30 64% - 69% Term 39.96 2.13 67.39 4.09 50% - 88% AndHighMed 132.66 3.24 274.07 1.64 100% - 113% Still good speedups over the "obvious hold index in RAM" option. Then, just testing the specialized scorers (baseline = DirectPF without specialized scorers): Task QPS base StdDev base QPS directStdDev direct Pct diff IntNRQ 74.86 3.42 71.72 0.27 -8% - 0% Wildcard 62.88 2.34 60.52 0.49 -7% - 0% Prefix3 102.46 3.98 98.92 0.85 -7% - 1% AndHighHigh 51.41 1.96 50.26 1.10 -7% - 3% AndHighMed 238.18 5.17 234.14 2.83 -4% - 1% Fuzzy1 179.64 1.73 177.96 3.27 -3% - 1% SloppyPhrase 8.97 0.37 8.93 0.48 -9% - 9% Respell 223.76 1.16 222.79 2.68 -2% - 1% Fuzzy2 79.62 1.38 79.31 0.90 -3% - 2% SpanNear 6.83 0.25 6.89 0.31 -7% - 9% PKLookup 220.25 1.46 225.17 2.56 0% - 4% OrHighMed 50.70 4.27 53.20 3.95 -10% - 23% TermGroup1M 22.17 0.33 23.42 0.37 2% - 8% TermBGroup1M 24.45 0.44 25.86 0.21 3% - 8% TermBGroup1M1P 30.61 0.82 32.76 0.12 3% - 10% Term 68.69 3.99 74.88 0.28 2% - 16% OrHighHigh 26.61 1.95 29.07 2.24 -6% - 26% Phrase 13.81 0.17 15.96 0.13 13% - 17% Reasonable but not immense speedups by specializing query scorers.
        Hide
        Michael McCandless added a comment -

        New patch, fixing previous nocommits / downgrading to TODOs. I also removed the specialized scorers since they seem not to help much.

        All tests pass, but I still need to fix all tests that now avoid MemoryPF to also avoid DirectPF. Otherwise I think it's ready...

        Show
        Michael McCandless added a comment - New patch, fixing previous nocommits / downgrading to TODOs. I also removed the specialized scorers since they seem not to help much. All tests pass, but I still need to fix all tests that now avoid MemoryPF to also avoid DirectPF. Otherwise I think it's ready...
        Hide
        Robert Muir added a comment -

        Would it really be that much slower if it was slightly more reasonable, e.g. storing freqs
        in packed ints (with huper-duper fast options) instead of wasting so much on them?

        Show
        Robert Muir added a comment - Would it really be that much slower if it was slightly more reasonable, e.g. storing freqs in packed ints (with huper-duper fast options) instead of wasting so much on them?
        Hide
        Michael McCandless added a comment -

        Would it really be that much slower if it was slightly more reasonable, e.g. storing freqs
        in packed ints (with huper-duper fast options) instead of wasting so much on them?

        Probably not that much slower? I think that's a good idea!

        But I think we can explore this after committing? There are other things we can try too (eg collapse skip list into shared int[]: I think this one may give a perf gain, collapse positions, etc.).

        Show
        Michael McCandless added a comment - Would it really be that much slower if it was slightly more reasonable, e.g. storing freqs in packed ints (with huper-duper fast options) instead of wasting so much on them? Probably not that much slower? I think that's a good idea! But I think we can explore this after committing? There are other things we can try too (eg collapse skip list into shared int[]: I think this one may give a perf gain, collapse positions, etc.).
        Hide
        Robert Muir added a comment -

        Yeah, i don't think we need to solve it before committing.

        I do think maybe this class needs some more warnings, to me it seems it will use crazy amounts of RAM.
        I also am not sure I like the name "Direct"... is it crazy to suggest "Instantiated"?

        Show
        Robert Muir added a comment - Yeah, i don't think we need to solve it before committing. I do think maybe this class needs some more warnings, to me it seems it will use crazy amounts of RAM. I also am not sure I like the name "Direct"... is it crazy to suggest "Instantiated"?
        Hide
        Michael McCandless added a comment -

        I do think maybe this class needs some more warnings, to me it seems it will use crazy amounts of RAM.

        I'll add some scary warnings

        I also am not sure I like the name "Direct"... is it crazy to suggest "Instantiated"?

        It is very much like the old instantiated (though I think its terms dict is faster than instantiated's)... but I didn't really like the name "Instanstiated"... I had picked Direct because it "directly" represents the postings ... but maybe we can find a better name.

        I will update MIGRATE.txt to explain how "Direct" (or whatever we name it) is the closest match if you were previously using Instantiated...

        Show
        Michael McCandless added a comment - I do think maybe this class needs some more warnings, to me it seems it will use crazy amounts of RAM. I'll add some scary warnings I also am not sure I like the name "Direct"... is it crazy to suggest "Instantiated"? It is very much like the old instantiated (though I think its terms dict is faster than instantiated's)... but I didn't really like the name "Instanstiated"... I had picked Direct because it "directly" represents the postings ... but maybe we can find a better name. I will update MIGRATE.txt to explain how "Direct" (or whatever we name it) is the closest match if you were previously using Instantiated...
        Hide
        Robert Muir added a comment -

        It is very much like the old instantiated (though I think its terms dict is faster than instantiated's)... but I didn't really like the name "Instanstiated"... I had picked Direct because it "directly" represents the postings ... but maybe we can find a better name.

        OK, I think what would be better is a better synonym for "Uncompressed". I realized Direct is consistent with packedints
        or whatever... but I don't think it should using this name either, its not intuitive.

        Show
        Robert Muir added a comment - It is very much like the old instantiated (though I think its terms dict is faster than instantiated's)... but I didn't really like the name "Instanstiated"... I had picked Direct because it "directly" represents the postings ... but maybe we can find a better name. OK, I think what would be better is a better synonym for "Uncompressed". I realized Direct is consistent with packedints or whatever... but I don't think it should using this name either, its not intuitive.
        Hide
        Michael McCandless added a comment -

        New patch, adding scary warning & MIGRATE.txt entry, fixing javadoc errors, and adding lucene.experimental ... still haven't thought of another name yet ...

        Show
        Michael McCandless added a comment - New patch, adding scary warning & MIGRATE.txt entry, fixing javadoc errors, and adding lucene.experimental ... still haven't thought of another name yet ...
        Hide
        Robert Muir added a comment -

        I dont have better name either. Lets just commit it with this one and think about it for later!

        Show
        Robert Muir added a comment - I dont have better name either. Lets just commit it with this one and think about it for later!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development