Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
ZStandard supports initialization of compressors and decompressors with a precomputed dictionary, which can dramatically improve and speed up compression of tables with small values. For more details, please see The Case For Small Data Compression.
If a table is going to have a lot of small values and the user can put together a representative set of files that can be used to train a dictionary for compressing those values, a dictionary can be trained with the zstd command line utility, available in any zstandard package for your favorite OS:
Training:
$ zstd --maxdict=1126400 --train-fastcover=shrink \ -o mytable.dict training_files/* Trying 82 different sets of parameters ... k=674 d=8 f=20 steps=40 split=75 accel=1 Save dictionary of size 1126400 into file mytable.dict
Deploy the dictionary file to HDFS or S3, etc.
Create the table:
hbase> create "mytable", ... , CONFIGURATION => { 'hbase.io.compress.zstd.level' => '6', 'hbase.io.compress.zstd.dictionary' => 'hdfs://nn/zdicts/mytable.dict' }
Now start storing data. Compression results even for small values will be excellent.
Note: Beware, if the dictionary is lost, the data will not be decompressable.
Attachments
Issue Links
- depends upon
-
HBASE-26316 Per-table or per-CF compression codec setting overrides
- Resolved
- relates to
-
HBASE-26405 IntegrationTestLoadSmallValues
- Resolved
- links to