[ARROW-12100] [C#] Cannot round-trip record batch with PyArrow - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 4.0.0
Component/s: C#, C++, Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/27924

Description

Has anyone ever tried to round-trip a record batch between Arrow C# and PyArrow? I can't get PyArrow to read the data correctly.

For context, I'm trying to do Arrow data-frames inter-process communication between C# and Python using shared memory (local TCP/IP is also an alternative). Ideally, I wouldn't even have to serialise the data and could just share the Arrow in-memory representation directly, but I'm not sure this is even possible with Apache Arrow. Full source code as attachment.

using (var stream = sharedMemory.CreateStream(0, 0, MemoryMappedFileAccess.ReadWrite))
{
    var recordBatch = /* ... */

    using (var writer = new ArrowFileWriter(stream, recordBatch.Schema, leaveOpen: true))
    {
        writer.WriteRecordBatch(recordBatch);
        writer.WriteEnd();
    }
}

Python

shmem = open_shared_memory(args)
address = get_shared_memory_address(shmem)
buf = pa.foreign_buffer(address, args.sharedMemorySize)
stream = pa.input_stream(buf)
reader = pa.ipc.open_stream(stream)

Unfortunately, it fails with the following error: pyarrow.lib.ArrowInvalid: Expected to read 1330795073 metadata bytes, but only read 1230.

I can see that the memory content starts with ARROW1\x00\x00\xff\xff\xff\xff\x08\x01\x00\x00\x10\x00\x00\x00. It seems that using the API calls above, PyArrow reads "ARRO" as the length of the metadata.

I assume I'm using the API incorrectly. Has anyone got a working example?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ArrowSharedMemory_20210326.zip
26/Mar/21 16:54
20 kB
Tanguy Fautre
ArrowSharedMemory_20210326_2.zip
26/Mar/21 21:18
20 kB
Tanguy Fautre
ArrowSharedMemory_20210329.zip
29/Mar/21 15:37
19 kB
Tanguy Fautre

Issue Links

links to

GitHub Pull Request #9837

Activity

People

Assignee:: Antoine Pitrou

Reporter:: Tanguy Fautre

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 26/Mar/21 16:56

Updated:: 11/Jan/23 08:24

Resolved:: 30/Mar/21 11:26

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m