Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
7.0.0
Description
When writing an IPC file having multiple record batches, the schema provided to `IpcFormatWriter` is correctly written to IPC file's footer, however, if the record batch written has its batch specific metadata associated with it, this metadata is not written.
This can be reproduced with the following test case (using pyarrow):
def test_chunked_record_batch_meta(): num_batches = 2 ipc_file = "/tmp/batches_with_metadata.arrow" int_array = pa.array([i for i in range(chunk_size)]) schema = pa.schema( [ ("values", pa.int64()), ], metadata={"foo": "bar"}, ) writer = pa.RecordBatchFileWriter( ipc_file, schema ) for i in range(num_batches): # follow examples here: # https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py batch = pa.record_batch( [int_array], names=["values"], metadata={"batch_id": str(i)}, ) writer.write_batch(batch) writer.close() mmapped_file = pa.memory_map(ipc_file) reader = pa.ipc.open_file(mmapped_file) batch_0 = reader.get_record_batch(0) assert batch_0.schema.metadata
Attachments
Issue Links
- is related to
-
ARROW-16220 [C++] IPC listener interface should allow receiving custom_metadata
- Open
-
ARROW-16430 [Python] Read/Write record batch custom metadata API in pyarrow
- Resolved
- supercedes
-
ARROW-6940 [C++] Expose Message-level IPC metadata in both read and write interfaces
- Closed
- links to