Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
When ArrowStreamWriter is writing a RecordBatch with null}}s in it, it is mixing up the column's {{NullCount.
You can see here:
It is writing the fields from 0 > fieldCount order. But then lower, it is writing the fields from fieldCount > 0.
Looking at the Java implementation it says
// struct vectors have to be created in reverse order
A simple test of roundtripping the following RecordBatch shows the issue:
var result = new RecordBatch( new Schema.Builder() .Field(f => f.Name("age").DataType(Int32Type.Default)) .Field(f => f.Name("CharCount").DataType(Int32Type.Default)) .Build(), new IArrowArray[] { new Int32Array( new ArrowBuffer.Builder<int>().Append(0).Build(), new ArrowBuffer.Builder<byte>().Append(0).Build(), length: 1, nullCount: 1, offset: 0), new Int32Array( new ArrowBuffer.Builder<int>().Append(7).Build(), ArrowBuffer.Empty, length: 1, nullCount: 0, offset: 0) }, length: 1);
Here, the "age" column should have a `null` in it. However, when you write and read this RecordBatch back, you see that the "CharCount" column has `NullCount` == 1 and "age" column has `NullCount` == 0.
Attachments
Issue Links
- links to