[ARROW-3769] [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: C++
Labels:
- parquet
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/20103

Description

If the goal is to hash this data anyway into a categorical-type array, then it would be better to offer the option to "push down" the hashing into the Parquet read hot path rather than first fully materializing a dense vector of ByteArray values, which could use a lot of memory after decompression

Attachments

Issue Links

is related to

ARROW-3325 [Python] Support reading Parquet binary/string columns directly as DictionaryArray

Resolved

relates to

PARQUET-1508 [C++] Enable reading from ByteArray and FixedLenByteArray decoders directly into arrow::BinaryBuilder or arrow::BinaryDictionaryBuilder

Resolved

links to

GitHub Pull Request #3721

Activity

People

Assignee:: Hatem Helal

Reporter:: Wes McKinney

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 25/Sep/18 11:25

Updated:: 11/Jan/23 07:29

Resolved:: 18/Mar/19 00:13

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

14h 10m