Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10112

Improve LZ4 Compression performance with direct primitive read/writes

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 9.0
    • core/other
    • None
    • New, Patch Available

    Description

      Summary

      Java9 introduced VarHandles as a tool to quickly read and write primitive types directly to byte arrays without bound checks. The LZ4 compressor class must consistently read ints from a byte array to analyze matches. The performance can be improved by reading these using a VarHandle.

      Additionally, the LZ4 compressor/decompressor methods currently individually read/write the bytes for LE shorts. Lucene's DataOutput/DataInput abstractions already have dedicated methods for reading/writing LE shorts. These methods are selectively optimized in certain implementations and will provide superior performance than individual byte reads.

      Concerns

      The DataOutput/DataInput readShort() and writeShort() methods do not call out that they are LE. It just looks to me that the DataOutput/DataInput are LE? Since this particular change does not appear to provide significant performance wins, maybe the patch is better leaving the explicit individual byte reads?

      Additionally, this patch changes read ints to read them in the platform native order which should be fine since it is just matching bytes. But I can change it to only read in the order the previous version did.

      Benchmarks

      I created JMH benchmarks which compresses 1MB of highly compressible JSON observability data. And compresses it 64KB at a time. In order to simulate the "short" changes, I use a forked version `ByteArrayDataOutput` which writes shorts using a VarHandle (to simulate fast writes that the ByteBuffer versions would get.) I also ran a benchmark without the short changes, just the reading ints using a VarHandle.

       

       

      Benchmark                                          Mode  Cnt    Score   Error  Units
      MyBenchmark.testCompressLuceneLZ4                 thrpt    9  712.430 ± 3.616  ops/s
      MyBenchmark.testCompressLuceneLZ4Forked           thrpt    9  945.380 ± 4.776  ops/s
      MyBenchmark.testCompressLuceneLZ4ForkedNoShort    thrpt    9  940.812 ± 3.868  ops/s
      MyBenchmark.testCompressLuceneLZ4HC               thrpt    9  147.432 ± 4.730  ops/s
      MyBenchmark.testCompressLuceneLZ4HCForked         thrpt    9  183.954 ± 2.534  ops/s
      MyBenchmark.testCompressLuceneLZ4HCForkedNoShort  thrpt    9  188.065 ± 0.727  ops/s

       

      Attachments

        1. LUCENE-10112.patch
          2 kB
          Tim Brooks

        Issue Links

          Activity

            People

              uschindler Uwe Schindler
              timbrooks Tim Brooks
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m