Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2108

Specification for RLEDictionary encoding is incorrect.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The spec for RLE Dictionary encoding says the "length of the encoded-data" is placed before the "encoded-data". Reproducing the first 3 lines here:

      ```

      rle-bit-packed-hybrid: <length> <encoded-data>

      length := length of the <encoded-data> in bytes stored as 4 bytes little endian (unsigned int32)

      encoded-data := <run>*

      ```

      However, this is not true. Parquet-MR implementation does not encode the length in front of the data. It encodes bitWidth as 1 byte. See implementation.

      I'm proposing the spec be updated to state the above clearly.

      see discussion here:

      https://lists.apache.org/thread/p45tpjd5r03qbswtpr7xfy072josnjxs

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            gamaken Balaji K
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: