Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The Split Readers hand over a batch of records at a time from the I/O thread (fetching and decoding) to the main operator processing thread.
These structures can memory intensive and expensive and performance greatly benefits from reusing them. This is especially true for high-performance format readers like ORC and Parquet.
While previous sources (where I/O was in the main thread) could reuse objects in a trivial manner, the new Split Reader API (with multiple threads) needs an explicit recycle() hook to allow returning/reusing these objects.