Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8749

[C++] IpcFormatWriter writes dictionary batches with wrong ID

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.16.0, 0.17.0
    • 2.0.0
    • C++
    • None

    Description

      IpcFormatWriter assigns dictionary IDs once when it writes the schema message. Then, when it writes dictionary batches, it assigns dictionary IDs again because it re-collects dictionaries from the given batch. So for example, if you have 5 dictionaries, the first dictionary will end up with ID 0 but be written with ID 5.

      For example, this will fail with "'_error_or_value11.status()' failed with Key error: No record of dictionary type with id 9"

      TEST_F(TestMetadata, DoPutDictionaries) {
        ASSERT_OK_AND_ASSIGN(auto sink, arrow::io::BufferOutputStream::Create());
        std::shared_ptr<Schema> schema = ExampleDictSchema();
        BatchVector expected_batches;
        ASSERT_OK(ExampleDictBatches(&expected_batches));
        ASSERT_OK_AND_ASSIGN(auto writer, arrow::ipc::NewStreamWriter(sink.get(), schema));
        for (auto& batch : expected_batches) {
          ASSERT_OK(writer->WriteRecordBatch(*batch));
        }
        ASSERT_OK_AND_ASSIGN(auto buf, sink->Finish());
        arrow::io::BufferReader source(buf);
        ASSERT_OK_AND_ASSIGN(auto reader, arrow::ipc::RecordBatchStreamReader::Open(&source));
        AssertSchemaEqual(schema, reader->schema());
        for (auto& batch : expected_batches) {
          ASSERT_OK_AND_ASSIGN(auto actual, reader->Next());
          AssertBatchesEqual(*actual, *batch);
        }
      }

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              lidavidm David Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: