Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1866

Replace Hadoop ZSTD with JNI-ZSTD

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.12.0
    • 1.12.0
    • parquet-mr
    • None

    Description

      The parquet-mr repo has been using ZSTD-JNI for the parquet-cli project. It is a cleaner approach to use this JNI than using Hadoop ZSTD compression, because 1) on the developing box, installing Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with ZSTD-JNI for parquet-hadoop project.

      According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI for ZSTD.

      Another approach is to use https://github.com/airlift/aircompressor which is a pure Java implementation. But it seems the compression level is not adjustable in aircompressor.

      Attachments

        Issue Links

          Activity

            People

              shangx@uber.com Xinli Shang
              shangx@uber.com Xinli Shang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: