Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5887

[C#] ArrowStreamWriter writes FieldNodes in wrong order

    XMLWordPrintableJSON

Details

    Description

      When ArrowStreamWriter is writing a RecordBatch with null}}s in it, it is mixing up the column's {{NullCount.

      You can see here:

      https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L195-L200

      It is writing the fields from 0 fieldCount order. But then lower, it is writing the fields from fieldCount 0.

      Looking at the Java implementation it says

      // struct vectors have to be created in reverse order

       

      A simple test of roundtripping the following RecordBatch shows the issue:

       

      var result = new RecordBatch(
      new Schema.Builder()
      .Field(f => f.Name("age").DataType(Int32Type.Default))
      .Field(f => f.Name("CharCount").DataType(Int32Type.Default))
      .Build(),
      new IArrowArray[]
      {
      new Int32Array(
      new ArrowBuffer.Builder<int>().Append(0).Build(),
      new ArrowBuffer.Builder<byte>().Append(0).Build(),
      length: 1,
      nullCount: 1,
      offset: 0),
      new Int32Array(
      new ArrowBuffer.Builder<int>().Append(7).Build(),
      ArrowBuffer.Empty,
      length: 1,
      nullCount: 0,
      offset: 0)
      },
      length: 1);
      

      Here, the "age" column should have a `null` in it. However, when you write and read this RecordBatch back, you see that the "CharCount" column has `NullCount` == 1 and "age" column has `NullCount` == 0.

      Attachments

        Issue Links

          Activity

            People

              eerhardt Eric Erhardt
              eerhardt Eric Erhardt
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 4h
                  4h
                  Remaining:
                  Time Spent - 0.5h Remaining Estimate - 3.5h
                  3.5h
                  Logged:
                  Time Spent - 0.5h Remaining Estimate - 3.5h
                  0.5h