Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26258

Universal compression support

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • HFile, Operability
    • None

    Description

      Some Hadoop compression codecs became more available in recent Hadoop 3.x releases, addressed by HBASE-25940. This is nice but still requires native platform support, which to state the obvious is not available on all platforms and architectures, even if native libaries for some are bundled into jars.

      Airlift's aircompressor (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2 licensed library, for Java 8 and up, available in Maven central, which provides pure Java implementations of desirable compression algorithms gzip, lz4, lzo, snappy, and zstd, and Hadoop compression codecs for same, claiming "they are typically 300% faster than the JNI wrappers." (https://github.com/airlift/aircompressor). This library is under active development and has up to date releases because it is used by Trino.

      We have another project that depends on universal availability of SNAPPY. I would like to make this change as a general improvement which also satisfies that requirement. (The as yet unnamed project will be contributed later.) It will be a very nice-to-have to have universal ZSTD support available as well.

      Proposed changes:

      • Modify Compression.java such that compression codec implementation classes can be specified by configuration. Currently they are hardcoded as strings.
      • Pull in aircompressor as a 'compile' time dependency so it will be bundled into our build and made available on the server classpath.
      • Modify Compression.java to fall back to an aircompressor pure Java implementation if schema specifies a compression algorithm, a Hadoop native codec was specified as desired implementation, but the requisite native support is somehow not available.

      The combination of these changes will provide universal (pure Java) support for these desired and desirable compression codecs while retaining default behavior, which is to load and utilize Hadoop native implementations of same, if native support is available. They will also let you override this default if you wish to chase the claimed benefits of the pure Java alternatives.

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              apurtell Andrew Kyle Purtell
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: