Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Requires PARQUET-1324 and probably quite a bit of extra work
Properly implementing this will require dictionary normalization across row groups. When reading a new row group, a fast path that compares the current dictionary with the prior dictionary should be used. This also needs to handle the case where a column chunk "fell back" to PLAIN encoding mid-stream
Attachments
Issue Links
- depends upon
-
ARROW-3144 [C++] Move "dictionary" member from DictionaryType to ArrayData to allow for changing dictionaries between Array chunks
- Resolved
-
ARROW-3772 [C++] Read Parquet dictionary encoded ColumnChunks directly into an Arrow DictionaryArray
- Resolved
-
ARROW-6042 [C++] Implement alternative DictionaryBuilder that always yields int32 indices
- Resolved
-
ARROW-6065 [C++] Reorganize parquet/arrow/reader.cc, remove code duplication, improve readability
- Resolved
-
ARROW-6077 [C++][Parquet] Build logical schema tree mapping Arrow fields to Parquet schema levels
- Resolved
- is related to
-
ARROW-3652 [Python] CategoricalIndex is lost after reading back
- Resolved
- relates to
-
ARROW-6140 [C++][Parquet] Support direct dictionary decoding of types other than BYTE_ARRAY
- Open
-
ARROW-3769 [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray
- Resolved
-
ARROW-3772 [C++] Read Parquet dictionary encoded ColumnChunks directly into an Arrow DictionaryArray
- Resolved
-
ARROW-3246 [Python][Parquet] direct reading/writing of pandas categoricals in parquet
- Resolved
- links to