[HBASE-26353] Support loadable dictionaries in hbase-compression-zstd - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.5.0, 3.0.0-alpha-2
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

ZStandard supports initialization of compressors and decompressors with a precomputed dictionary, which can dramatically improve and speed up compression of tables with small values. For more details, please see The Case For Small Data Compression.

If a table is going to have a lot of small values and the user can put together a representative set of files that can be used to train a dictionary for compressing those values, a dictionary can be trained with the zstd command line utility, available in any zstandard package for your favorite OS:

Training:

$ zstd --maxdict=1126400 --train-fastcover=shrink \
    -o mytable.dict training_files/*
Trying 82 different sets of parameters
...
k=674                                      
d=8
f=20
steps=40
split=75
accel=1
Save dictionary of size 1126400 into file mytable.dict

Deploy the dictionary file to HDFS or S3, etc.

Create the table:

hbase> create "mytable", 
  ... ,
  CONFIGURATION => {
    'hbase.io.compress.zstd.level' => '6',
    'hbase.io.compress.zstd.dictionary' => 'hdfs://nn/zdicts/mytable.dict'
  }

Now start storing data. Compression results even for small values will be excellent.

Note: Beware, if the dictionary is lost, the data will not be decompressable.

Attachments

Issue Links

depends upon

HBASE-26316 Per-table or per-CF compression codec setting overrides

Resolved

relates to

HBASE-26405 IntegrationTestLoadSmallValues

Resolved

links to

GitHub Pull Request #3748

GitHub Pull Request #3787

Activity

People

Assignee:: Andrew Kyle Purtell

Reporter:: Andrew Kyle Purtell

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 13/Oct/21 01:44

Updated:: 30/Oct/21 18:27

Resolved:: 29/Oct/21 16:28