Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1241

[C++] Use LZ4 frame format

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Do
    • None
    • None
    • parquet-cpp
    • None

    Description

      The parquet-format spec doesn't currently specify whether lz4-compressed data should be framed or not. We should choose one and make it explicit in the spec, as they are not inter-operable. After some discussions with others [1], we think it would be beneficial to use the framed format, which adds a small header in exchange for more self-contained decompression as well as a richer feature set (checksums, parallel decompression, etc).

      The current arrow implementation compresses using the lz4 block format, and this would need to be updated when we add the spec clarification.

      If backwards compatibility is a concern, I would suggest adding an additional LZ4_FRAMED compression type, but that may be more noise than anything.

      [1] https://github.com/dask/fastparquet/issues/314

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            llchan Lawrence Chan
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment