Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12100

[C#] Cannot round-trip record batch with PyArrow

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Has anyone ever tried to round-trip a record batch between Arrow C# and PyArrow? I can't get PyArrow to read the data correctly.

      For context, I'm trying to do Arrow data-frames inter-process communication between C# and Python using shared memory (local TCP/IP is also an alternative). Ideally, I wouldn't even have to serialise the data and could just share the Arrow in-memory representation directly, but I'm not sure this is even possible with Apache Arrow. Full source code as attachment.

      C#

      using (var stream = sharedMemory.CreateStream(0, 0, MemoryMappedFileAccess.ReadWrite))
      {
          var recordBatch = /* ... */
      
          using (var writer = new ArrowFileWriter(stream, recordBatch.Schema, leaveOpen: true))
          {
              writer.WriteRecordBatch(recordBatch);
              writer.WriteEnd();
          }
      }
      

      Python

      shmem = open_shared_memory(args)
      address = get_shared_memory_address(shmem)
      buf = pa.foreign_buffer(address, args.sharedMemorySize)
      stream = pa.input_stream(buf)
      reader = pa.ipc.open_stream(stream)
      

      Unfortunately, it fails with the following error: pyarrow.lib.ArrowInvalid: Expected to read 1330795073 metadata bytes, but only read 1230.

      I can see that the memory content starts with ARROW1\x00\x00\xff\xff\xff\xff\x08\x01\x00\x00\x10\x00\x00\x00. It seems that using the API calls above, PyArrow reads "ARRO" as the length of the metadata.

      I assume I'm using the API incorrectly. Has anyone got a working example?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            apitrou Antoine Pitrou
            GPSnoopy Tanguy Fautre
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m

                Slack

                  Issue deployment