Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6042

[C++] Implement alternative DictionaryBuilder that always yields int32 indices

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.15.0
    • C++

    Description

      One problem with the current DictionaryBuilder<T> in some applications is that, if it is used to produce a series of arrays to form a ChunkedArray, it may yield constituent chunks having different index widths. For example:

      chunk 0: int8 indices
      chunk 1: int16 indices
      chunk 2: int16 indices
      chunk 3: int32 indices
      chunk 4: int32 indices
      chunk 5: int32 indices
      chunk 6: int32 indices
      

      Obviously this is problematic for these applications. I'm running into this issue in the context of ARROW-3772 where we are looking to decode Parquet data directly to DictionaryArray without stepping through an intermediate dense decoded stage.

      I'm not sure what to call the class, whether DictionaryInt32Builder or something similar, but this would be the same API more or less as DictionaryBuilder but instead use Int32Builder for the indices rather than AdaptiveIntBuilder.

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              wesm Wes McKinney
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m