Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7522

[C++][Plasma] Broken Record Batch returned from a function call

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.15.1
    • None
    • C++, C++ - Plasma
    • None
    • macOS

    Description

      Scenario: retrieving Record Batch from Plasma with known Object ID.

      The following code snippet works well:

      int main(int argc, char **argv)
      {
          plasma::ObjectID object_id = plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF");
      
          // Start up and connect a Plasma client.
          plasma::PlasmaClient client;
          ARROW_CHECK_OK(client.Connect("/tmp/store"));
      
          plasma::ObjectBuffer object_buffer;
          ARROW_CHECK_OK(client.Get(&object_id, 1, -1, &object_buffer));
      
          // Retrieve object data.
          auto buffer = object_buffer.data;
      
          arrow::io::BufferReader buffer_reader(buffer); 
          std::shared_ptr<arrow::ipc::RecordBatchReader> record_batch_stream_reader;
          ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(&buffer_reader, &record_batch_stream_reader));
      
          std::shared_ptr<arrow::RecordBatch> record_batch;
          arrow::Status status = record_batch_stream_reader->ReadNext(&record_batch);
      
          std::cout << "record_batch->column_name(0): " << record_batch->column_name(0) << std::endl;
          std::cout << "record_batch->num_columns(): " << record_batch->num_columns() << std::endl;
          std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << std::endl;
          std::cout << "record_batch->column(0)->length(): "
                    << record_batch->column(0)->length() << std::endl;
          std::cout << "record_batch->column(0)->ToString(): "
                    << record_batch->column(0)->ToString() << std::endl;
      }
      

      record_batch->column(0)->ToString() would incur a segmentation fault if retrieving Record Batch is wrapped in a function:

      std::shared_ptr<arrow::RecordBatch> GetRecordBatchFromPlasma(plasma::ObjectID object_id)
      {
          // Start up and connect a Plasma client.
          plasma::PlasmaClient client;
          ARROW_CHECK_OK(client.Connect("/tmp/store"));
      
          plasma::ObjectBuffer object_buffer;
          ARROW_CHECK_OK(client.Get(&object_id, 1, -1, &object_buffer));
      
          // Retrieve object data.
          auto buffer = object_buffer.data;
      
          arrow::io::BufferReader buffer_reader(buffer);
          std::shared_ptr<arrow::ipc::RecordBatchReader> record_batch_stream_reader;
          ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(&buffer_reader, &record_batch_stream_reader));
      
          std::shared_ptr<arrow::RecordBatch> record_batch;
          arrow::Status status = record_batch_stream_reader->ReadNext(&record_batch);
      
          // Disconnect the client.
          ARROW_CHECK_OK(client.Disconnect());
      
          return record_batch;
      }
      
      int main(int argc, char **argv)
      {
          plasma::ObjectID object_id = plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF");
      
          std::shared_ptr<arrow::RecordBatch> record_batch = GetRecordBatchFromPlasma(object_id);
      
          std::cout << "record_batch->column_name(0): " << record_batch->column_name(0) << std::endl;
          std::cout << "record_batch->num_columns(): " << record_batch->num_columns() << std::endl;
          std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << std::endl;
          std::cout << "record_batch->column(0)->length(): "
                    << record_batch->column(0)->length() << std::endl;
          std::cout << "record_batch->column(0)->ToString(): "
                    << record_batch->column(0)->ToString() << std::endl;
      }
      

      The meta info of the Record Batch such as number of columns and rows is still available, but I can't see the content of the columns.

      lldb says that the stop reason is EXC_BAD_ACCESS, so I think the Record Batch is destroyed after GetRecordBatchFromPlasma finishes. But why can I still see the meta info of this Record Batch?
      What is the proper way to get the Record Batch if we insist using a function?

      Attachments

        Activity

          People

            Unassigned Unassigned
            cxma Chengxin Ma
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: