Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26258 Universal compression support
  3. HBASE-26316

Per-table or per-CF compression codec setting overrides

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.5.0, 3.0.0-alpha-2
    • 2.5.0, 3.0.0-alpha-2
    • HFile, Operability
    • None
    • Reviewed
    • Hide
      It is now possible to specify codec configuration options as part of table or column family schema definitions. The configuration options will only apply to the defined scope. For example:

        hbase> create 'sometable', \
          { NAME => 'somefamily', COMPRESSION => 'ZSTD' }, \
          CONFIGURATION => { 'hbase.io.compress.zstd.level' => '9' }
      Show
      It is now possible to specify codec configuration options as part of table or column family schema definitions. The configuration options will only apply to the defined scope. For example:   hbase> create 'sometable', \     { NAME => 'somefamily', COMPRESSION => 'ZSTD' }, \     CONFIGURATION => { 'hbase.io.compress.zstd.level' => '9' }

    Description

      This won't work as expected today...

      hbase> create 'sometable', \
        { NAME => 'somefamily', VERSIONS => 1000, COMPRESSION => 'ZSTD' }, \
        CONFIGURATION => { 'hbase.io.compress.zstd.level' => '9' }
      

      ... but it should. We get and retain Compressor instances in HFileBlockDefaultEncodingContext, and could in theory call Compressor#reinit when setting up the context, to update compression parameters like compression level and buffer size per the ambient configuration, but we do not plumb through the CompoundConfiguration from the Store into HFileBlockDefaultEncodingContext. Instead can only update codec parameters globally in system site conf files.

      This is actually pretty important for algorithms like ZSTD, which offers more than 20 different compression levels, where at level 1 it is almost as fast at compression as LZ4, and where at levels > 19 it utilizes computationally expensive techniques to rival LZMA at compression ratio (and poor compression speed). It is very likely that the ZSTD level you'd want to employ for a given table's data will vary by use case.

      Attachments

        Issue Links

          Activity

            People

              apurtell Andrew Kyle Purtell
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: