Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
One problem with the current DictionaryBuilder<T> in some applications is that, if it is used to produce a series of arrays to form a ChunkedArray, it may yield constituent chunks having different index widths. For example:
chunk 0: int8 indices chunk 1: int16 indices chunk 2: int16 indices chunk 3: int32 indices chunk 4: int32 indices chunk 5: int32 indices chunk 6: int32 indices
Obviously this is problematic for these applications. I'm running into this issue in the context of ARROW-3772 where we are looking to decode Parquet data directly to DictionaryArray without stepping through an intermediate dense decoded stage.
I'm not sure what to call the class, whether DictionaryInt32Builder or something similar, but this would be the same API more or less as DictionaryBuilder but instead use Int32Builder for the indices rather than AdaptiveIntBuilder.
Attachments
Issue Links
- is depended upon by
-
ARROW-3325 [Python] Support reading Parquet binary/string columns directly as DictionaryArray
- Resolved
- links to