Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7254

BaseVariableWidthVector#setSafe appears to make value offsets inconsistent

    XMLWordPrintableJSON

    Details

      Description

      The following program writes a file which PyArrow either segfaults (0.14.1) or rejects with an error (0.15.1) pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4 on reading.

      Calling setRowCount again, or calling setSafe with a higher index fixes it. While it seems from the new documentation that we should (must?) call VectorSchemaRoot#setRowCount at the end, I wouldn't have expected to get an invalid file by calling using setSafe, either.

      Full traceback:

      > python3 -c 'import pyarrow as pa; print(pa.ipc.open_stream(open("./test.bin", "rb")).read_pandas())'
      Traceback (most recent call last):
        File "<string>", line 1, in <module>
        File "/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py", line 46, in read_pandas
          table = self.read_all()
        File "pyarrow/ipc.pxi", line 330, in pyarrow.lib._CRecordBatchReader.read_all
        File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table
        File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4
      

       
      Full program:

      import java.io.OutputStream;
      import java.nio.charset.StandardCharsets;
      import java.nio.file.Files;
      import java.nio.file.Paths;
      import java.util.Collections;
      import org.apache.arrow.memory.BufferAllocator;
      import org.apache.arrow.memory.RootAllocator;
      import org.apache.arrow.vector.VarCharVector;
      import org.apache.arrow.vector.VectorSchemaRoot;
      import org.apache.arrow.vector.ipc.ArrowStreamWriter;
      import org.apache.arrow.vector.types.pojo.ArrowType;
      import org.apache.arrow.vector.types.pojo.Field;
      import org.apache.arrow.vector.types.pojo.Schema;
      
      public class AsdfTest {
      
        public static void main(String[] args) throws Exception {
          Schema schema = new Schema(Collections.singletonList(Field.nullable("a", new ArrowType.Utf8())));
      
          try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
              VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
            root.setRowCount(2);
            VarCharVector v = (VarCharVector) root.getVector("a");
            v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8));
            try (OutputStream output = Files.newOutputStream(Paths.get("./test.bin"))) {
              ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output);
              writer.writeBatch();
              writer.close();
            }
          }
        }
      }
      

      v.setNull(1) after v.setSafe(0, "asdf") does not fix it. Using set instead of setSafe will fail in Java.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                fan_li_ya Liya Fan
                Reporter:
                lidavidm David Li
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h