Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26258 Universal compression support
  3. HBASE-26353

Support loadable dictionaries in hbase-compression-zstd

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.5.0, 3.0.0-alpha-2
    • None
    • None
    • Reviewed

    Description

      ZStandard supports initialization of compressors and decompressors with a precomputed dictionary, which can dramatically improve and speed up compression of tables with small values. For more details, please see The Case For Small Data Compression.

      If a table is going to have a lot of small values and the user can put together a representative set of files that can be used to train a dictionary for compressing those values, a dictionary can be trained with the zstd command line utility, available in any zstandard package for your favorite OS:

      Training:

      $ zstd --maxdict=1126400 --train-fastcover=shrink \
          -o mytable.dict training_files/*
      Trying 82 different sets of parameters
      ...
      k=674                                      
      d=8
      f=20
      steps=40
      split=75
      accel=1
      Save dictionary of size 1126400 into file mytable.dict
      

      Deploy the dictionary file to HDFS or S3, etc.

      Create the table:

      hbase> create "mytable", 
        ... ,
        CONFIGURATION => {
          'hbase.io.compress.zstd.level' => '6',
          'hbase.io.compress.zstd.dictionary' => 'hdfs://nn/zdicts/mytable.dict'
        }
      

      Now start storing data. Compression results even for small values will be excellent.

      Note: Beware, if the dictionary is lost, the data will not be decompressable.

      Attachments

        Issue Links

          Activity

            People

              apurtell Andrew Kyle Purtell
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: