Lucene - Core
  1. Lucene - Core
  2. LUCENE-5145

Added AppendingPackedLongBuffer & extended AbstractAppendingLongBuffer family (customizable compression ratio + bulk retrieval)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Made acceptableOverheadRatio configurable
      Added bulk get to AbstractAppendingLongBuffer classes, for faster retrieval.
      Introduced a new variant, AppendingPackedLongBuffer which solely relies on PackedInts as a back-end. This new class is useful where people have non-negative numbers with a fairly uniform distribution over a fixed (limited) range. Ex. facets ordinals.
      To distinguish it from AppendingPackedLongBuffer, delta based AppendingLongBuffer was renamed to AppendingDeltaPackedLongBuffer
      Fixed an Issue with NullReader where it didn't respect it's valueCount in bulk gets.

      1. LUCENE-5145.patch
        67 kB
        Boaz Leskes
      2. LUCENE-5145.v2.patch
        73 kB
        Boaz Leskes

        Activity

        Hide
        Boaz Leskes added a comment - - edited

        While making the above changes I did some measurements which I feel is also useful to share.

        PackedInts trade CPU for better CPU cache & memory usage. PackedInts gives you an acceptableOverheadRatio parameter to control the trade off but is not exposed in the AbstractAppendingLongBuffer family is based on those. This is especially important when you do no rely on the AbstractAppendingLongBuffer.iterator() to extract your data. Here is some experiments I run on my laptop, using BenchmarkAppendLongBufferRead which is included in the patch. The program allows you to play with different read strategies and data size and measure reading times.

        This is the result of using AppendingDeltaPackedLongBuffer (previously called AppendingLongBuffer) to sequential read an array of 500000 elements, using it's get method. The data was uniformly distributed numbers between 0 & 7. The program measure 10,000 such read. The total time is the time it took to perform all of them. You also see in the output the number of bits used to store the elements and the storage class used.

        ------- Storage: DELTA_PACKED, Read: SEQUENTIAL, Read size: 1
        SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 22.18s avg: 2.22ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 223.76kb)
        SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 19.14s avg: 1.91ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 521.13kb)

        As you can see, when retrieving elements one by one, the byte based implementation slightly faster. For comparison, the new AppendingPackedLongBuffer with the same setup:

        ------- Storage: PACKED, Read: SEQUENTIAL, Read size: 1
        SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 16.69s avg: 1.67ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb)
        SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 13.47s avg: 1.35ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb)
        Next to the fact that is faster, you see the same behavior.

        For random reads, the classes display similar behavior:

        ------- Storage: DELTA_PACKED, Read: RANDOM, Read size: 1
        SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 23.13s avg: 2.31ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 223.76kb)
        SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 19.38s avg: 1.94ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 521.13kb)

        ------- Storage: PACKED, Read: RANDOM, Read size: 1
        SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 19.23s avg: 1.92ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb)
        SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 15.95s avg: 1.60ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb)

        Next I looked at the effect of exposing the bulk reads offered by the PackedInts structures in the AppendingLongBuffer family. Here is some results from the new packed implementation, this time reading 4 & 16 consecutive elements in a single read.

        ------- Storage: PACKED, Read: SEQUENTIAL, Read size: 4
        SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 11.16s avg: 1.12ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb)
        BULK GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 24.22s avg: 2.42ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb)
        SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 8.35s avg: 0.84ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb)
        BULK GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 8.44s avg: 0.84ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb)

        ------- Storage: PACKED, Read: SEQUENTIAL, Read size: 16
        SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 9.63s avg: 0.96ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb)
        BULK GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 12.52s avg: 1.25ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb)
        SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 7.46s avg: 0.75ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb)
        BULK GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 3.22s avg: 0.32ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb)

        As you can see the bulk read api for the Direct8 class, starts to pay off when reading 4 elements or more. For the Packed64 it still slower when reading 16 elements at a time. On my MacBook Air, it only started paying off at around 32 elements. This may be different on systems with more CPU caches.

        Those tests (and many more I played with) are of course very synthetic. I also run the new classes under our implementation of faceted search. The results were similar.

        Show
        Boaz Leskes added a comment - - edited While making the above changes I did some measurements which I feel is also useful to share. PackedInts trade CPU for better CPU cache & memory usage. PackedInts gives you an acceptableOverheadRatio parameter to control the trade off but is not exposed in the AbstractAppendingLongBuffer family is based on those. This is especially important when you do no rely on the AbstractAppendingLongBuffer.iterator() to extract your data. Here is some experiments I run on my laptop, using BenchmarkAppendLongBufferRead which is included in the patch. The program allows you to play with different read strategies and data size and measure reading times. This is the result of using AppendingDeltaPackedLongBuffer (previously called AppendingLongBuffer) to sequential read an array of 500000 elements, using it's get method. The data was uniformly distributed numbers between 0 & 7. The program measure 10,000 such read. The total time is the time it took to perform all of them. You also see in the output the number of bits used to store the elements and the storage class used. ------- Storage: DELTA_PACKED, Read: SEQUENTIAL, Read size: 1 SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 22.18s avg: 2.22ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 223.76kb) SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 19.14s avg: 1.91ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 521.13kb) As you can see, when retrieving elements one by one, the byte based implementation slightly faster. For comparison, the new AppendingPackedLongBuffer with the same setup: ------- Storage: PACKED, Read: SEQUENTIAL, Read size: 1 SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 16.69s avg: 1.67ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb) SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 13.47s avg: 1.35ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb) Next to the fact that is faster, you see the same behavior. For random reads, the classes display similar behavior: ------- Storage: DELTA_PACKED, Read: RANDOM, Read size: 1 SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 23.13s avg: 2.31ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 223.76kb) SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 19.38s avg: 1.94ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 521.13kb) ------- Storage: PACKED, Read: RANDOM, Read size: 1 SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 19.23s avg: 1.92ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb) SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 15.95s avg: 1.60ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb) Next I looked at the effect of exposing the bulk reads offered by the PackedInts structures in the AppendingLongBuffer family. Here is some results from the new packed implementation, this time reading 4 & 16 consecutive elements in a single read. ------- Storage: PACKED, Read: SEQUENTIAL, Read size: 4 SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 11.16s avg: 1.12ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb) BULK GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 24.22s avg: 2.42ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb) SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 8.35s avg: 0.84ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb) BULK GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 8.44s avg: 0.84ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb) ------- Storage: PACKED, Read: SEQUENTIAL, Read size: 16 SINGLE GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 9.63s avg: 0.96ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb) BULK GET: 3 bits ratio 0.00 (i.e., 3 bits) total time: 12.52s avg: 1.25ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Packed64, 219.76kb) SINGLE GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 7.46s avg: 0.75ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb) BULK GET: 3 bits ratio 7.00 (i.e., 8 bits) total time: 3.22s avg: 0.32ms, total read: 2500000000 elm (class org.apache.lucene.util.packed.Direct8, 517.13kb) As you can see the bulk read api for the Direct8 class, starts to pay off when reading 4 elements or more. For the Packed64 it still slower when reading 16 elements at a time. On my MacBook Air, it only started paying off at around 32 elements. This may be different on systems with more CPU caches. Those tests (and many more I played with) are of course very synthetic. I also run the new classes under our implementation of faceted search. The results were similar.
        Hide
        Adrien Grand added a comment -

        Thanks Boaz, the patch looks very good!

        • I like the fact that the addition of the new bulk API helped make fillValues final!
        • OrdinalMap.subIndexes, SortedDocValuesWriter.pending and SortedSetDocValuesWriter.pending are 0-based so they could use the new AppendingPackedLongBuffer instead of AppendingDeltaPackedLongBuffer, can you update the patch?
        Show
        Adrien Grand added a comment - Thanks Boaz, the patch looks very good! I like the fact that the addition of the new bulk API helped make fillValues final! OrdinalMap.subIndexes, SortedDocValuesWriter.pending and SortedSetDocValuesWriter.pending are 0-based so they could use the new AppendingPackedLongBuffer instead of AppendingDeltaPackedLongBuffer , can you update the patch?
        Hide
        Boaz Leskes added a comment - - edited

        Based on Adrien's comments, I changed MultiDocValues.subIndexes, SortedDocValuesWriter.pending , SortedSetDocValuesWriter.pending use new AppendingPackedLongBuffer (see v2 patch)

        Show
        Boaz Leskes added a comment - - edited Based on Adrien's comments, I changed MultiDocValues.subIndexes, SortedDocValuesWriter.pending , SortedSetDocValuesWriter.pending use new AppendingPackedLongBuffer (see v2 patch)
        Hide
        ASF subversion and git services added a comment -

        Commit 1508423 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1508423 ]

        LUCENE-5145: AppendingPackedLongBuffer and added suport for bulk get operations to the Appending*Buffers.

        Introduced bulk retrieval to AbstractAppendingLongBuffer
        classes, for faster retrieval. Introduced a new variant,
        AppendingPackedLongBuffer which solely relies on PackedInts as a backend.
        This new class is useful where people have non-negative numbers with a
        uniform distribution over a fixed (limited) range. Ex. facets ordinals. To
        distinguish it from AppendingPackedLongBuffer, delta based
        AppendingLongBuffer was renamed to AppendingDeltaPackedLongBuffer Fixed an
        Issue with NullReader where it didn't respect it's valueCount in bulk gets.

        Show
        ASF subversion and git services added a comment - Commit 1508423 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1508423 ] LUCENE-5145 : AppendingPackedLongBuffer and added suport for bulk get operations to the Appending*Buffers. Introduced bulk retrieval to AbstractAppendingLongBuffer classes, for faster retrieval. Introduced a new variant, AppendingPackedLongBuffer which solely relies on PackedInts as a backend. This new class is useful where people have non-negative numbers with a uniform distribution over a fixed (limited) range. Ex. facets ordinals. To distinguish it from AppendingPackedLongBuffer, delta based AppendingLongBuffer was renamed to AppendingDeltaPackedLongBuffer Fixed an Issue with NullReader where it didn't respect it's valueCount in bulk gets.
        Hide
        ASF subversion and git services added a comment -

        Commit 1508430 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1508430 ]

        LUCENE-5145: AppendingPackedLongBuffer and added suport for bulk get operations to the Appending*Buffers.

        Introduced bulk retrieval to AbstractAppendingLongBuffer
        classes, for faster retrieval. Introduced a new variant,
        AppendingPackedLongBuffer which solely relies on PackedInts as a backend.
        This new class is useful where people have non-negative numbers with a
        uniform distribution over a fixed (limited) range. Ex. facets ordinals. To
        distinguish it from AppendingPackedLongBuffer, delta based
        AppendingLongBuffer was renamed to AppendingDeltaPackedLongBuffer Fixed an
        Issue with NullReader where it didn't respect it's valueCount in bulk gets.

        Show
        ASF subversion and git services added a comment - Commit 1508430 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1508430 ] LUCENE-5145 : AppendingPackedLongBuffer and added suport for bulk get operations to the Appending*Buffers. Introduced bulk retrieval to AbstractAppendingLongBuffer classes, for faster retrieval. Introduced a new variant, AppendingPackedLongBuffer which solely relies on PackedInts as a backend. This new class is useful where people have non-negative numbers with a uniform distribution over a fixed (limited) range. Ex. facets ordinals. To distinguish it from AppendingPackedLongBuffer, delta based AppendingLongBuffer was renamed to AppendingDeltaPackedLongBuffer Fixed an Issue with NullReader where it didn't respect it's valueCount in bulk gets.
        Hide
        Adrien Grand added a comment -

        Committed. Thanks Boaz!

        Show
        Adrien Grand added a comment - Committed. Thanks Boaz!
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Boaz Leskes
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development