Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
3.0.0
Description
Has anyone ever tried to round-trip a record batch between Arrow C# and PyArrow? I can't get PyArrow to read the data correctly.
For context, I'm trying to do Arrow data-frames inter-process communication between C# and Python using shared memory (local TCP/IP is also an alternative). Ideally, I wouldn't even have to serialise the data and could just share the Arrow in-memory representation directly, but I'm not sure this is even possible with Apache Arrow. Full source code as attachment.
C#
using (var stream = sharedMemory.CreateStream(0, 0, MemoryMappedFileAccess.ReadWrite)) { var recordBatch = /* ... */ using (var writer = new ArrowFileWriter(stream, recordBatch.Schema, leaveOpen: true)) { writer.WriteRecordBatch(recordBatch); writer.WriteEnd(); } }
Python
shmem = open_shared_memory(args) address = get_shared_memory_address(shmem) buf = pa.foreign_buffer(address, args.sharedMemorySize) stream = pa.input_stream(buf) reader = pa.ipc.open_stream(stream)
Unfortunately, it fails with the following error: pyarrow.lib.ArrowInvalid: Expected to read 1330795073 metadata bytes, but only read 1230.
I can see that the memory content starts with ARROW1\x00\x00\xff\xff\xff\xff\x08\x01\x00\x00\x10\x00\x00\x00. It seems that using the API calls above, PyArrow reads "ARRO" as the length of the metadata.
I assume I'm using the API incorrectly. Has anyone got a working example?
Attachments
Attachments
Issue Links
- links to