Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
While working on ARROW-6769 I noted that RecordbBatchProjector is not thread safe. My goal is to use this class to wrap the ScanTaskIterator in another ScanTaskIterator that projects, so producer (fragments) don't have to know about this schema. The issue is that ScanTask are expected to run on concurrent thread. The projector will be invoked by multiple thread.
The lack of concurrency safety is due to adaptivity of input schemas and `SetInputSchema` stores in a local cache. I suggest we refactor into 2 classes.
- `RecordBatchProjector` which will work with a static `from` schema, i.e. no adaptivity. The schema is defined at construct time. This class is thread safe to invoke after construction since no local modification is done.
- `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash, std::shared_ptr<RecordBatchProjector>] protected with a mutex.