Lucene - Core
  1. Lucene - Core
  2. LUCENE-4098

Efficient bulk operations for packed integer arrays

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA, 5.0
    • Component/s: core/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      There are some places in Lucene code that

      {iterate over,set}

      ranges of values of a packed integer array. Because bit-packing implementations (Packed*) tend be slower than direct implementations, this can take a lot of time.

      For example, under some scenarii, GrowableWriter can take most of its (averaged) set time in resizing operations.

      However, some bit-packing schemes, such as the one that is used by Packed64SingleBlock*, allow to implement efficient bulk operations such as get/set/fill. Implementing these bulk operations in {{PackedInts.

      {Reader,Mutable}

      }} and using them across other components instead of their single-value counterpart could help improve performance.

      1. LUCENE-4098.patch
        27 kB
        Adrien Grand
      2. LUCENE-4098.patch
        30 kB
        Michael McCandless
      3. LUCENE-4098.patch
        31 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Thanks Adrien!

        Show
        Michael McCandless added a comment - Thanks Adrien!
        Hide
        Michael McCandless added a comment -

        Patch, w/ the fix for the new random test (now passes).

        I also tweaked the random test to sometimes use the PackedInts.copy method.

        Finally, I changed PackedInts.copy to alloc a long[] of size min(capacity, len) so we don't over-allocate if the incoming mem is larger than it needs to be.

        I think it's ready! Thanks Adrien.

        Show
        Michael McCandless added a comment - Patch, w/ the fix for the new random test (now passes). I also tweaked the random test to sometimes use the PackedInts.copy method. Finally, I changed PackedInts.copy to alloc a long[] of size min(capacity, len) so we don't over-allocate if the incoming mem is larger than it needs to be. I think it's ready! Thanks Adrien.
        Hide
        Michael McCandless added a comment -

        Aha, phew! A bug in the test ... with that change the test looks like it's passing so far ... I'll beast it for a while. Thanks Adrien.

        Show
        Michael McCandless added a comment - Aha, phew! A bug in the test ... with that change the test looks like it's passing so far ... I'll beast it for a while. Thanks Adrien.
        Hide
        Adrien Grand added a comment - - edited

        The bulk get and set are not guaranteed to return/set exactly len longs (so that they can stay at a block boundary to make the subsequent reads/writes faster). So I think

        int got = packed1.get(start, buffer, offset, len);
        assertTrue(got <= len);
        int sot = packed2.set(start, buffer, offset, len);
        assertTrue(sot <= len);
        

        should be replaced with

        int got = packed1.get(start, buffer, offset, len);
        assertTrue(got <= len);
        int sot = packed2.set(start, buffer, offset, got);
        assertTrue(sot <= got);
        
        Show
        Adrien Grand added a comment - - edited The bulk get and set are not guaranteed to return/set exactly len longs (so that they can stay at a block boundary to make the subsequent reads/writes faster). So I think int got = packed1.get(start, buffer, offset, len); assertTrue(got <= len); int sot = packed2.set(start, buffer, offset, len); assertTrue(sot <= len); should be replaced with int got = packed1.get(start, buffer, offset, len); assertTrue(got <= len); int sot = packed2.set(start, buffer, offset, got); assertTrue(sot <= got);
        Hide
        Michael McCandless added a comment -

        The patch looks great!

        I wrote a new random test (in attached patch) that just randomly bulk copies a bunch of slices from one packed ints array to another and then asserts the values are correct ... but it's failing. I'm not sure why yet, and it's entirely possible it's a test bug! If you run with -Dtests.verbose=true it prints details...

        Show
        Michael McCandless added a comment - The patch looks great! I wrote a new random test (in attached patch) that just randomly bulk copies a bunch of slices from one packed ints array to another and then asserts the values are correct ... but it's failing. I'm not sure why yet, and it's entirely possible it's a test bug! If you run with -Dtests.verbose=true it prints details...
        Hide
        Adrien Grand added a comment -

        Slightly updated patch that fixes the computation of maxValue in GrowableWriter and uses | instead of Math.max to compute the max value when performing a bulk set (since GrowableWriter only works with unsigned integers).

        As a side note, the reason why I didn't write bulk get and set methods for Packed64 is that I didn't find how to do it efficiently (= with less instructions and no conditional) without writing specialized methods for every bitsPerValue.

        Show
        Adrien Grand added a comment - Slightly updated patch that fixes the computation of maxValue in GrowableWriter and uses | instead of Math.max to compute the max value when performing a bulk set (since GrowableWriter only works with unsigned integers). As a side note, the reason why I didn't write bulk get and set methods for Packed64 is that I didn't find how to do it efficiently (= with less instructions and no conditional) without writing specialized methods for every bitsPerValue.
        Hide
        Adrien Grand added a comment -

        Here is the patch for the proposed modifications. All Mutable implementations have a new efficient fill method and Packed64SingleBlock* classes also have efficient bulk get and set.

        For example, the execution time of the following (unrealistic) microbenchmark is more than twice better with the patch applied on my computer thanks to the use of PackedInts.copy instead of naive copy (see GrowableWriter#ensureCapacity).

        for (int k = 0; k < 50; ++k) {
            long start = System.nanoTime();
            GrowableWriter wrt = new GrowableWriter(1, 1 << 22, PackedInts.DEFAULT);
            for (int i = 0; i < 1 << 22; ++i) {
                wrt.set(i,i);
            }
            long end = System.nanoTime();
            System.out.println((end - start) / 1000000);
            long sum = 0;
            for (int i = 0; i < wrt.size(); ++i) {
                sum += wrt.get(i);
            }
            System.out.println(sum);
        }
        
        Show
        Adrien Grand added a comment - Here is the patch for the proposed modifications. All Mutable implementations have a new efficient fill method and Packed64SingleBlock* classes also have efficient bulk get and set. For example, the execution time of the following (unrealistic) microbenchmark is more than twice better with the patch applied on my computer thanks to the use of PackedInts.copy instead of naive copy (see GrowableWriter#ensureCapacity ). for ( int k = 0; k < 50; ++k) { long start = System .nanoTime(); GrowableWriter wrt = new GrowableWriter(1, 1 << 22, PackedInts.DEFAULT); for ( int i = 0; i < 1 << 22; ++i) { wrt.set(i,i); } long end = System .nanoTime(); System .out.println((end - start) / 1000000); long sum = 0; for ( int i = 0; i < wrt.size(); ++i) { sum += wrt.get(i); } System .out.println(sum); }

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development