Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.11.0
Description
Using pyarrow v0.11.0, the attached script writes a simple table (lifted from the pyarrow doc) to both parquet format versions 1.0 and 2.0, with and without dictionary encoding enabled.
Inspecting the written files using parquet-tools appears to show that dictionary encoding is not used in either of the version 2.0 files. Both files report that the columns are encoded using PLAIN,RLE and that the dictionary page offset is zero. I was expecting that the column encoding would include RLE_DICTIONARY. Attached are the script with repro steps and the files that were generated by it.
Below is the output of using parquet-tools meta on the version 2.0 files
% parquet-tools meta example_v2.0_dict_True.parquet file: file:.../example_v2.0_dict_True.parquet creator: parquet-cpp version 1.5.1-SNAPSHOT { Unknown macro: {extra} }} file schema: schema -------------------------------------------------------------------------------- one: OPTIONAL DOUBLE R:0 D:1 three: OPTIONAL BOOLEAN R:0 D:1 two: OPTIONAL BINARY R:0 D:1 _index_level_0_: OPTIONAL BINARY R:0 D:1 row group 1: RC:3 TS:211 OFFSET:4 -------------------------------------------------------------------------------- one: DOUBLE SNAPPY DO:0 FPO:4 SZ:65/63/0.97 VC:3 ENC:PLAIN,RLE ST:[min: -1.0, max: 2.5, num_nulls: 1] three: BOOLEAN SNAPPY DO:0 FPO:142 SZ:36/34/0.94 VC:3 ENC:PLAIN,RLE ST:[min: false, max: true, num_nulls: 0] two: BINARY SNAPPY DO:0 FPO:225 SZ:60/58/0.97 VC:3 ENC:PLAIN,RLE ST:[min: 0x626172, max: 0x666F6F, num_nulls: 0] _index_level_0_: BINARY SNAPPY DO:0 FPO:328 SZ:50/48/0.96 VC:3 ENC:PLAIN,RLE ST:[min: 0x61, max: 0x63, num_nulls: 0]| version='2.0', use_dictionary = False |% parquet-tools meta example_v2.0_dict_False.parquet file: file:.../example_v2.0_dict_False.parquet creator: parquet-cpp version 1.5.1-SNAPSHOT { Unknown macro: {extra} }} |
Attachments
Attachments
Issue Links
- links to