Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.8.0
Description
Using java arrow I'm creating an arrow stream, using the stream writer.
Sometimes I don't have anything to serialize, and so I don't write any record batches. My arrow stream thus consists of just a schema message.
<SCHEMA> <EOS [optional]: int32>
I am able to deserialize this arrow stream correctly using the java stream reader, but when reading it with python I instead hit an error
import pyarrow as pa
# ...
reader = pa.open_stream(stream)
df = reader.read_all().to_pandas()
produces
File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all File "error.pxi", line 77, in pyarrow.lib.check_status ArrowInvalid: Must pass at least one record batch
i.e. we're hitting the check in https://github.com/apache/arrow/blob/apache-arrow-0.8.0/cpp/src/arrow/table.cc#L284
The workaround we're currently using is to always ensure we serialize at least one record batch, even if it's empty. However, I think it would be nice to either support a stream without record batches or explicitly disallow this and then match behaviour in java.
Attachments
Issue Links
- links to