Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2222

[Format] RLE encoding spec incorrect for v2 data pages

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • format-2.10.0
    • parquet-format
    • None

    Description

      The spec (https://github.com/apache/parquet-format/blob/master/Encodings.md#run-length-encoding--bit-packing-hybrid-rle--3) has this:

      rle-bit-packed-hybrid: <length> <encoded-data>
      length := length of the <encoded-data> in bytes stored as 4 bytes little endian (unsigned int32)
      

      But the length is actually prepended only in v1 data pages, not in v2 data pages.

      Attachments

        Activity

          People

            Unassigned Unassigned
            apitrou Antoine Pitrou
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: