Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2307

[Python] Unable to read arrow stream containing 0 record batches

    XMLWordPrintableJSON

Details

    Description

      Using java arrow I'm creating an arrow stream, using the stream writer.

       

      Sometimes I don't have anything to serialize, and so I don't write any record batches. My arrow stream thus consists of just a schema message. 

      <SCHEMA>
      <EOS [optional]: int32>
      

      I am able to deserialize this arrow stream correctly using the java stream reader, but when reading it with python I instead hit an error

      import pyarrow as pa
      # ...
      reader = pa.open_stream(stream)
      df = reader.read_all().to_pandas()
      

      produces

        File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all
        File "error.pxi", line 77, in pyarrow.lib.check_status
      ArrowInvalid: Must pass at least one record batch
      

      i.e. we're hitting the check in https://github.com/apache/arrow/blob/apache-arrow-0.8.0/cpp/src/arrow/table.cc#L284

      The workaround we're currently using is to always ensure we serialize at least one record batch, even if it's empty. However, I think it would be nice to either support a stream without record batches or explicitly disallow this and then match behaviour in java.

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              bduffield Benjamin Duffield
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: