Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
If the goal is to hash this data anyway into a categorical-type array, then it would be better to offer the option to "push down" the hashing into the Parquet read hot path rather than first fully materializing a dense vector of ByteArray values, which could use a lot of memory after decompression
Attachments
Issue Links
- is related to
-
ARROW-3325 [Python] Support reading Parquet binary/string columns directly as DictionaryArray
- Resolved
- relates to
-
PARQUET-1508 [C++] Enable reading from ByteArray and FixedLenByteArray decoders directly into arrow::BinaryBuilder or arrow::BinaryDictionaryBuilder
- Resolved
- links to