Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9958

Error writing record batches to IPC streaming format

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.0.1
    • None
    • GLib, Python
    • pyarrow - Version: 1.0.1
      python - version 3.7.6
      Operating system - CentOS Linux release 7.8.2003 (Core)

    Description

      Writing record batches to the Arrow IPC streaming format with on-the-fly compression generally raises errors of one type or the other when reading it back.

      PFA the code producing each of the below errors. I can't reproduce it for smaller batch sizes, so it probably has to do with size of each record batch. It does not seem specific to pyarrow since I see a similar issue with the C-Glib API.

      #Error case 1

      ```
      ~/py376/lib/python3.7/site-packages/pyarrow/ipc.pxi in pyarrow.lib._CRecordBatchReader.read_next_batch()

      ~/py376/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

      OSError: Truncated compressed stream
      ```

      #Error case 2

      ```
      ~/py376/lib/python3.7/site-packages/pyarrow/ipc.pxi in pyarrow.lib._RecordBatchStreamReader._open()

      ~/py376/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()

      ~/py376/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

      ArrowInvalid: Tried reading schema message, was null or length 0
      ```

      Attachments

        1. example2.py
          0.7 kB
          Ishan
        2. example1.py
          0.8 kB
          Ishan

        Activity

          People

            Unassigned Unassigned
            ananis25 Ishan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: