[ARROW-462] [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.5.0
Component/s: C++
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/16106

Description

We use a hash table to extract unique values and dictionary indices. There may be an opportunity to consolidate common code from the dictionary encoding implementation implemented in parquet-cpp (but the dictionary indices will not be run-length encoded in Arrow):

https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/dictionary-encoding.h

This functionality also needs to permit encoding split across multiple record batches – so the hash table would be a stateful entity, and we can continue to hash more chunks of data to dictionary-encode multiple arrays with a shared dictionary at the end.

Attachments

Activity

People

Assignee:: Uwe Korn

Reporter:: Wes McKinney

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Jan/17 16:20

Updated:: 11/Jan/23 07:08

Resolved:: 07/Jul/17 00:18