Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6854

[Dataset][C++] RecordBatchProjector is not thread safe

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: C++
    • Labels:
      None

      Description

      While working on ARROW-6769 I noted that RecordbBatchProjector is not thread safe. My goal is to use this class to wrap the ScanTaskIterator in another ScanTaskIterator that projects, so producer (fragments) don't have to know about this schema. The issue is that ScanTask are expected to run on concurrent thread. The projector will be invoked by multiple thread.

      The lack of concurrency safety is due to adaptivity of input schemas and `SetInputSchema` stores in a local cache. I suggest we refactor into 2 classes. 

      1. `RecordBatchProjector` which will work with a static `from` schema, i.e. no adaptivity. The schema is defined at construct time. This class is thread safe to invoke after construction since no local modification is done.
      2. `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash, std::shared_ptr<RecordBatchProjector>] protected with a mutex. 

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              fsaintjacques Francois Saint-Jacques
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: