Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1866

Replace Hadoop ZSTD with JNI-ZSTD

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.12.0
    • 1.12.0
    • parquet-mr
    • None

    Description

      The parquet-mr repo has been using ZSTD-JNI for the parquet-cli project. It is a cleaner approach to use this JNI than using Hadoop ZSTD compression, because 1) on the developing box, installing Hadoop is cumbersome, 2) Older version of Hadoop doesn't support ZSTD. Upgrading Hadoop is another pain. This Jira is to replace Hadoop ZSTD with ZSTD-JNI for parquet-hadoop project.

      According to the author of ZSTD-JNI, Flink, Spark, Cassandra all use ZSTD-JNI for ZSTD.

      Another approach is to use https://github.com/airlift/aircompressor which is a pure Java implementation. But it seems the compression level is not adjustable in aircompressor.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shangx@uber.com Xinli Shang Assign to me
            shangx@uber.com Xinli Shang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment