Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17733

[C++] Concatenating dictionary arrays with nulls fills wrong parts of index buffer with 0.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 10.0.0
    • C++

    Description

      When concatenating dictionary arrays with nulls, and whose index type is not 8-bit wide the wrong bits of the index buffer get zeroed out.

      Example using pyarrow:

      import pyarrow as pa
      dictionary_type = pa.dictionary(pa.int16(), pa.string())
      empty_array = pa.array([], dictionary_type)
      array1 = pa.array(["a", "b", None], dictionary_type)
      array2 = pa.concat_arrays([empty_array, array1])
      print(array1.to_pylist())
      print(array2.to_pylist()) 

      We would expect array1 and array2 to be the same, but this prints:

      ['a', 'b', None]
      ['a', 'a', None] 

       

      This bug happens because the index type is 2-byte wide, so the null at position 2 should result in zeroing out byte 4-5 (0-indexed) of the index buffer. However the code instead zeroes out byte 2-3 because we don't take into account the width of the index type when adding the position here:

      https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/concatenate.cc#L314-L315

      Attachments

        Issue Links

          Activity

            People

              rasnjo Rasmus Johansen
              rasnjo Rasmus Johansen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m