Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26258 Universal compression support
  3. HBASE-26259

Fallback support to pure Java compression

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.0, 3.0.0-alpha-2
    • Performance
    • None
    • Reviewed
    • Hide
      This change introduces provided compression codecs to HBase as
       new Maven modules. Each module provides compression codec support that formerly required Hadoop native codecs, which in turn relies on native code integration, which may or may not be available on a given hardware platform or in an operational environment. We now provide codecs in the HBase distribution for users whom for whatever reason cannot or do not wish to deploy the Hadoop native codecs.
      Show
      This change introduces provided compression codecs to HBase as  new Maven modules. Each module provides compression codec support that formerly required Hadoop native codecs, which in turn relies on native code integration, which may or may not be available on a given hardware platform or in an operational environment. We now provide codecs in the HBase distribution for users whom for whatever reason cannot or do not wish to deploy the Hadoop native codecs.

    Description

      Airlift’s aircompressor (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2 licensed library, for Java 8 and up, available in Maven central, which provides pure Java implementations of gzip, lz4, lzo, snappy, and zstd and Hadoop compression codecs for same, claiming “they are typically 300% faster than the JNI wrappers.” (https://github.com/airlift/aircompressor). This library is under active development and up to date releases because it is used by Trino.

      Proposed changes:

      • Modify Compression.java such that compression codec implementation classes can be specified by configuration. Currently they are hardcoded as strings.
      • Pull in aircompressor as a ‘compile’ time dependency so it will be bundled into our build and made available on the server classpath.
      • Modify Compression.java to fall back to an aircompressor pure Java implementation if schema specifies a compression algorithm, a Hadoop native codec was specified as desired implementation, but the requisite native support is somehow not available.

      Attachments

        1. ac_lz4_results.pdf
          127 kB
          Andrew Kyle Purtell
        2. ac_snappy_results.pdf
          127 kB
          Andrew Kyle Purtell
        3. ac_zstd_results.pdf
          127 kB
          Andrew Kyle Purtell
        4. BenchmarkCodec.java
          5 kB
          Andrew Kyle Purtell
        5. BenchmarksMain.java
          2 kB
          Andrew Kyle Purtell
        6. lz4_lz4-java_result.pdf
          127 kB
          Andrew Kyle Purtell
        7. RandomDistribution.java
          6 kB
          Andrew Kyle Purtell
        8. xerial_snappy_results.pdf
          127 kB
          Andrew Kyle Purtell

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            apurtell Andrew Kyle Purtell
            apurtell Andrew Kyle Purtell
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment