[PARQUET-1866] Replace Hadoop ZSTD with JNI-ZSTD - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.12.0
Fix Version/s: 1.12.0
Component/s: parquet-mr
Labels:
None

Description

The parquet-mr repo has been using ZSTD-JNI for the parquet-cli project. It is a cleaner approach to use this JNI than using Hadoop ZSTD compression, because 1) on the developing box, installing Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with ZSTD-JNI for parquet-hadoop project.

According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI for ZSTD.

Another approach is to use https://github.com/airlift/aircompressor which is a pure Java implementation. But it seems the compression level is not adjustable in aircompressor.

Attachments

Issue Links

Blocked

PARQUET-1876 Port ZSTD-JNI support to 1.10.x brach

Open

Activity

People

Assignee:: Xinli Shang

Reporter:: Xinli Shang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/May/20 19:30

Updated:: 23/Jun/24 03:31

Resolved:: 03/Jun/20 07:17