Lucene - Core
  1. Lucene - Core
  2. LUCENE-5266

Optimization of the direct PackedInts readers

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Given that the initial focus for PackedInts readers was more on in-memory readers (for storing stuff like the mapping from old to new doc IDs at merging time), I never spent time trying to optimize the direct readers although it could be beneficial now that they are used for disk-based doc values.

      1. LUCENE-5266.patch
        3 kB
        Ryan Ernst
      2. LUCENE-5266.patch
        1 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        here is a patch from playing around this morning.

        I'm afraid of specialization here: but this one should help the relatively low bpv I think by using readShort ?

        Show
        Robert Muir added a comment - here is a patch from playing around this morning. I'm afraid of specialization here: but this one should help the relatively low bpv I think by using readShort ?
        Hide
        Ryan Ernst added a comment -

        Here's another patch that optimizes for each possible number of bytes that need to be read. I haven't performance tested it yet, but it passes tests.

        Show
        Ryan Ernst added a comment - Here's another patch that optimizes for each possible number of bytes that need to be read. I haven't performance tested it yet, but it passes tests.
        Hide
        Adrien Grand added a comment -

        This is very close to what I was thinking about, I'm looking forward to the performance tests!

        Show
        Adrien Grand added a comment - This is very close to what I was thinking about, I'm looking forward to the performance tests!
        Hide
        Michael McCandless added a comment -

        I tested Ryan's patch with Karl's float benchmark (but first I switched it to use MMapDir).

        Before (4.x):

        Scoring 25000000 documents with a lucene value (without square root) source took 1361
        Scoring 25000000 documents with a lucene value (without square root) source took 1344
        Scoring 25000000 documents with a lucene value (without square root) source took 1338
        Scoring 25000000 documents with a lucene value (without square root) source took 1373
        Scoring 25000000 documents with a lucene value (without square root) source took 1344
        

        With Ryan's patch:

        Scoring 25000000 documents with a lucene value (without square root) source took 1116
        Scoring 25000000 documents with a lucene value (without square root) source took 1088
        Scoring 25000000 documents with a lucene value (without square root) source took 1086
        Scoring 25000000 documents with a lucene value (without square root) source took 1086
        Scoring 25000000 documents with a lucene value (without square root) source took 1089
        

        It's just a micro-benchmark but it looks like a nice speedup!

        Show
        Michael McCandless added a comment - I tested Ryan's patch with Karl's float benchmark (but first I switched it to use MMapDir). Before (4.x): Scoring 25000000 documents with a lucene value (without square root) source took 1361 Scoring 25000000 documents with a lucene value (without square root) source took 1344 Scoring 25000000 documents with a lucene value (without square root) source took 1338 Scoring 25000000 documents with a lucene value (without square root) source took 1373 Scoring 25000000 documents with a lucene value (without square root) source took 1344 With Ryan's patch: Scoring 25000000 documents with a lucene value (without square root) source took 1116 Scoring 25000000 documents with a lucene value (without square root) source took 1088 Scoring 25000000 documents with a lucene value (without square root) source took 1086 Scoring 25000000 documents with a lucene value (without square root) source took 1086 Scoring 25000000 documents with a lucene value (without square root) source took 1089 It's just a micro-benchmark but it looks like a nice speedup!
        Hide
        Adrien Grand added a comment -

        Very nice, indeed. I've also been playing with the patch and it seems to be consistently faster than the previous version of the direct packed reader, so I'll commit it tomorrow if there is no objection.

        Show
        Adrien Grand added a comment - Very nice, indeed. I've also been playing with the patch and it seems to be consistently faster than the previous version of the direct packed reader, so I'll commit it tomorrow if there is no objection.
        Hide
        Robert Muir added a comment -

        FWIW I plugged in a "doublesort" to a macro-benchmark I have at work:
        5000 hard queries (OrHighHighHigh) + sort by doubledocvalues field

        name qps avg p10 p50 p90 p99
        trunk 41.66 23ms 16ms 22ms 32ms 42ms
        patch 45.32 21ms 15ms 21ms 29ms 38ms
        Show
        Robert Muir added a comment - FWIW I plugged in a "doublesort" to a macro-benchmark I have at work: 5000 hard queries (OrHighHighHigh) + sort by doubledocvalues field name qps avg p10 p50 p90 p99 trunk 41.66 23ms 16ms 22ms 32ms 42ms patch 45.32 21ms 15ms 21ms 29ms 38ms
        Hide
        Ryan Ernst added a comment -

        Glad to see it works! FWIW, I think we could get another slight bump by making cases 1 - 8 only do a single read, and then adjusting the shift right value accordingly to filter out the extra bytes read. It would save on the bounds checks there. In my previous encoding work, we found memory access was so fast that it was better to read more than have any conditionals. The only caveat is the encoding would need to ensure there is always an extra 2 bytes at the end (so cases 3, 5 and 7 would read an extra byte, and case 6 would read 2 extra bytes).

        Case 9 always requires an extra read. But really it seems like the encoder should never use a value that could cause that? If my math is correct, I believe it can only happen when bpv 57-63. But the space savings would be mostly negligible at that width compared to 64.

        Show
        Ryan Ernst added a comment - Glad to see it works! FWIW, I think we could get another slight bump by making cases 1 - 8 only do a single read, and then adjusting the shift right value accordingly to filter out the extra bytes read. It would save on the bounds checks there. In my previous encoding work, we found memory access was so fast that it was better to read more than have any conditionals. The only caveat is the encoding would need to ensure there is always an extra 2 bytes at the end (so cases 3, 5 and 7 would read an extra byte, and case 6 would read 2 extra bytes). Case 9 always requires an extra read. But really it seems like the encoder should never use a value that could cause that? If my math is correct, I believe it can only happen when bpv 57-63. But the space savings would be mostly negligible at that width compared to 64.
        Hide
        Adrien Grand added a comment -

        The only caveat is the encoding would need to ensure there is always an extra 2 bytes at the end.

        There are some places (codecs) where I encode many short sequences consecutively so I care about not wasting extra bytes but if this proves to help performance, I think it shouldn't be too hard to do add the ability to have extra bytes at the end of the stream (I'm thinking about adding a new PackedInts.Format to the enum but there might be other options).

        Show
        Adrien Grand added a comment - The only caveat is the encoding would need to ensure there is always an extra 2 bytes at the end. There are some places (codecs) where I encode many short sequences consecutively so I care about not wasting extra bytes but if this proves to help performance, I think it shouldn't be too hard to do add the ability to have extra bytes at the end of the stream (I'm thinking about adding a new PackedInts.Format to the enum but there might be other options).
        Hide
        ASF subversion and git services added a comment -

        Commit 1532623 from Ryan Ernst in branch 'dev/trunk'
        [ https://svn.apache.org/r1532623 ]

        LUCENE-5266: Improved DirectPackedReader performance

        Show
        ASF subversion and git services added a comment - Commit 1532623 from Ryan Ernst in branch 'dev/trunk' [ https://svn.apache.org/r1532623 ] LUCENE-5266 : Improved DirectPackedReader performance
        Hide
        ASF subversion and git services added a comment -

        Commit 1532655 from Ryan Ernst in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1532655 ]

        LUCENE-5266: Improved DirectPackedReader performance

        Show
        ASF subversion and git services added a comment - Commit 1532655 from Ryan Ernst in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1532655 ] LUCENE-5266 : Improved DirectPackedReader performance

          People

          • Assignee:
            Ryan Ernst
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development