Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26258 Universal compression support
  3. HBASE-26259

Fallback support to pure Java compression

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.5.0, 3.0.0-alpha-2
    • Component/s: Performance
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This change introduces provided compression codecs to HBase as
       new Maven modules. Each module provides compression codec support that formerly required Hadoop native codecs, which in turn relies on native code integration, which may or may not be available on a given hardware platform or in an operational environment. We now provide codecs in the HBase distribution for users whom for whatever reason cannot or do not wish to deploy the Hadoop native codecs.
      Show
      This change introduces provided compression codecs to HBase as  new Maven modules. Each module provides compression codec support that formerly required Hadoop native codecs, which in turn relies on native code integration, which may or may not be available on a given hardware platform or in an operational environment. We now provide codecs in the HBase distribution for users whom for whatever reason cannot or do not wish to deploy the Hadoop native codecs.

      Description

      Airlift’s aircompressor (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2 licensed library, for Java 8 and up, available in Maven central, which provides pure Java implementations of gzip, lz4, lzo, snappy, and zstd and Hadoop compression codecs for same, claiming “they are typically 300% faster than the JNI wrappers.” (https://github.com/airlift/aircompressor). This library is under active development and up to date releases because it is used by Trino.

      Proposed changes:

      • Modify Compression.java such that compression codec implementation classes can be specified by configuration. Currently they are hardcoded as strings.
      • Pull in aircompressor as a ‘compile’ time dependency so it will be bundled into our build and made available on the server classpath.
      • Modify Compression.java to fall back to an aircompressor pure Java implementation if schema specifies a compression algorithm, a Hadoop native codec was specified as desired implementation, but the requisite native support is somehow not available.

        Attachments

        1. xerial_snappy_results.pdf
          127 kB
          Andrew Kyle Purtell
        2. BenchmarksMain.java
          2 kB
          Andrew Kyle Purtell
        3. BenchmarkCodec.java
          5 kB
          Andrew Kyle Purtell
        4. RandomDistribution.java
          6 kB
          Andrew Kyle Purtell
        5. ac_lz4_results.pdf
          127 kB
          Andrew Kyle Purtell
        6. ac_snappy_results.pdf
          127 kB
          Andrew Kyle Purtell
        7. ac_zstd_results.pdf
          127 kB
          Andrew Kyle Purtell
        8. lz4_lz4-java_result.pdf
          127 kB
          Andrew Kyle Purtell

          Issue Links

            Activity

              People

              • Assignee:
                apurtell Andrew Kyle Purtell
                Reporter:
                apurtell Andrew Kyle Purtell
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: