Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6854

[Dataset][C++] RecordBatchProjector is not thread safe

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      While working on ARROW-6769 I noted that RecordbBatchProjector is not thread safe. My goal is to use this class to wrap the ScanTaskIterator in another ScanTaskIterator that projects, so producer (fragments) don't have to know about this schema. The issue is that ScanTask are expected to run on concurrent thread. The projector will be invoked by multiple thread.

      The lack of concurrency safety is due to adaptivity of input schemas and `SetInputSchema` stores in a local cache. I suggest we refactor into 2 classes. 

      1. `RecordBatchProjector` which will work with a static `from` schema, i.e. no adaptivity. The schema is defined at construct time. This class is thread safe to invoke after construction since no local modification is done.
      2. `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash, std::shared_ptr<RecordBatchProjector>] protected with a mutex. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            fsaintjacques Francois Saint-Jacques
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: