Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The spec for RLE Dictionary encoding says the "length of the encoded-data" is placed before the "encoded-data". Reproducing the first 3 lines here:
```
rle-bit-packed-hybrid: <length> <encoded-data>
length := length of the <encoded-data> in bytes stored as 4 bytes little endian (unsigned int32)
encoded-data := <run>*
```
However, this is not true. Parquet-MR implementation does not encode the length in front of the data. It encodes bitWidth as 1 byte. See implementation.
I'm proposing the spec be updated to state the above clearly.
see discussion here:
https://lists.apache.org/thread/p45tpjd5r03qbswtpr7xfy072josnjxs