Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7254

BaseVariableWidthVector#setSafe appears to make value offsets inconsistent

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The following program writes a file which PyArrow either segfaults (0.14.1) or rejects with an error (0.15.1) pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4 on reading.

      Calling setRowCount again, or calling setSafe with a higher index fixes it. While it seems from the new documentation that we should (must?) call VectorSchemaRoot#setRowCount at the end, I wouldn't have expected to get an invalid file by calling using setSafe, either.

      Full traceback:

      > python3 -c 'import pyarrow as pa; print(pa.ipc.open_stream(open("./test.bin", "rb")).read_pandas())'
      Traceback (most recent call last):
        File "<string>", line 1, in <module>
        File "/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py", line 46, in read_pandas
          table = self.read_all()
        File "pyarrow/ipc.pxi", line 330, in pyarrow.lib._CRecordBatchReader.read_all
        File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table
        File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4
      

       
      Full program:

      import java.io.OutputStream;
      import java.nio.charset.StandardCharsets;
      import java.nio.file.Files;
      import java.nio.file.Paths;
      import java.util.Collections;
      import org.apache.arrow.memory.BufferAllocator;
      import org.apache.arrow.memory.RootAllocator;
      import org.apache.arrow.vector.VarCharVector;
      import org.apache.arrow.vector.VectorSchemaRoot;
      import org.apache.arrow.vector.ipc.ArrowStreamWriter;
      import org.apache.arrow.vector.types.pojo.ArrowType;
      import org.apache.arrow.vector.types.pojo.Field;
      import org.apache.arrow.vector.types.pojo.Schema;
      
      public class AsdfTest {
      
        public static void main(String[] args) throws Exception {
          Schema schema = new Schema(Collections.singletonList(Field.nullable("a", new ArrowType.Utf8())));
      
          try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
              VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
            root.setRowCount(2);
            VarCharVector v = (VarCharVector) root.getVector("a");
            v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8));
            try (OutputStream output = Files.newOutputStream(Paths.get("./test.bin"))) {
              ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output);
              writer.writeBatch();
              writer.close();
            }
          }
        }
      }
      

      v.setNull(1) after v.setSafe(0, "asdf") does not fix it. Using set instead of setSafe will fail in Java.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            fan_li_ya Liya Fan Assign to me
            lidavidm David Li
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h

                Slack

                  Issue deployment