Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6042

[C++] Implement alternative DictionaryBuilder that always yields int32 indices

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15.0
    • Component/s: C++

      Description

      One problem with the current DictionaryBuilder<T> in some applications is that, if it is used to produce a series of arrays to form a ChunkedArray, it may yield constituent chunks having different index widths. For example:

      chunk 0: int8 indices
      chunk 1: int16 indices
      chunk 2: int16 indices
      chunk 3: int32 indices
      chunk 4: int32 indices
      chunk 5: int32 indices
      chunk 6: int32 indices
      

      Obviously this is problematic for these applications. I'm running into this issue in the context of ARROW-3772 where we are looking to decode Parquet data directly to DictionaryArray without stepping through an intermediate dense decoded stage.

      I'm not sure what to call the class, whether DictionaryInt32Builder or something similar, but this would be the same API more or less as DictionaryBuilder but instead use Int32Builder for the indices rather than AdaptiveIntBuilder.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesm Wes McKinney
                Reporter:
                wesm Wes McKinney
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m