Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.12.0
-
None
Description
The parquet-mr repo has been using ZSTD-JNI for the parquet-cli project. It is a cleaner approach to use this JNI than using Hadoop ZSTD compression, because 1) on the developing box, installing Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with ZSTD-JNI for parquet-hadoop project.
According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI for ZSTD.
Another approach is to use https://github.com/airlift/aircompressor which is a pure Java implementation. But it seems the compression level is not adjustable in aircompressor.
Attachments
Issue Links
- Blocked
-
PARQUET-1876 Port ZSTD-JNI support to 1.10.x brach
- Open