Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12100

[C#] Cannot round-trip record batch with PyArrow

    XMLWordPrintableJSON

Details

    Description

      Has anyone ever tried to round-trip a record batch between Arrow C# and PyArrow? I can't get PyArrow to read the data correctly.

      For context, I'm trying to do Arrow data-frames inter-process communication between C# and Python using shared memory (local TCP/IP is also an alternative). Ideally, I wouldn't even have to serialise the data and could just share the Arrow in-memory representation directly, but I'm not sure this is even possible with Apache Arrow. Full source code as attachment.

      C#

      using (var stream = sharedMemory.CreateStream(0, 0, MemoryMappedFileAccess.ReadWrite))
      {
          var recordBatch = /* ... */
      
          using (var writer = new ArrowFileWriter(stream, recordBatch.Schema, leaveOpen: true))
          {
              writer.WriteRecordBatch(recordBatch);
              writer.WriteEnd();
          }
      }
      

      Python

      shmem = open_shared_memory(args)
      address = get_shared_memory_address(shmem)
      buf = pa.foreign_buffer(address, args.sharedMemorySize)
      stream = pa.input_stream(buf)
      reader = pa.ipc.open_stream(stream)
      

      Unfortunately, it fails with the following error: pyarrow.lib.ArrowInvalid: Expected to read 1330795073 metadata bytes, but only read 1230.

      I can see that the memory content starts with ARROW1\x00\x00\xff\xff\xff\xff\x08\x01\x00\x00\x10\x00\x00\x00. It seems that using the API calls above, PyArrow reads "ARRO" as the length of the metadata.

      I assume I'm using the API incorrectly. Has anyone got a working example?

      Attachments

        1. ArrowSharedMemory_20210326_2.zip
          20 kB
          Tanguy Fautre
        2. ArrowSharedMemory_20210326.zip
          20 kB
          Tanguy Fautre
        3. ArrowSharedMemory_20210329.zip
          19 kB
          Tanguy Fautre

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              GPSnoopy Tanguy Fautre
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m