Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16131

[C++] Record batch specific metadata is not saved in IPC file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 7.0.0
    • 8.0.0
    • C++

    Description

      When writing an IPC file having multiple record batches, the schema provided to `IpcFormatWriter` is correctly written to IPC file's footer, however, if the record batch written has its batch specific metadata associated with it, this metadata is not written.

      This can be reproduced with the following test case (using pyarrow):

      def test_chunked_record_batch_meta():
          num_batches = 2
          ipc_file = "/tmp/batches_with_metadata.arrow"
          int_array = pa.array([i for i in range(chunk_size)])
          schema = pa.schema(
              [
                  ("values", pa.int64()),
              ],
              metadata={"foo": "bar"},
          )
          writer = pa.RecordBatchFileWriter(
              ipc_file, schema
          )
          for i in range(num_batches):
              # follow examples here:
              # https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py
              batch = pa.record_batch(
                  [int_array],
                  names=["values"],
                  metadata={"batch_id": str(i)},
              )
              writer.write_batch(batch)
          writer.close()
          mmapped_file = pa.memory_map(ipc_file)
          reader = pa.ipc.open_file(mmapped_file)
          batch_0 = reader.get_record_batch(0)
          assert batch_0.schema.metadata 

      Attachments

        Issue Links

          Activity

            People

              niyue Yue Ni
              niyue Yue Ni
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h
                  6h