Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The GobblinOrcWriter contains a converter and a buffer rowbatch. The buffer holds the converted Avro -> Orc records before adding them to the native orc writer.
Since it can contain multiple records, it constantly needs to resize the columns of the rowbatch in order to hold multiple records. This problem affects both performance and memory when resizing is done either too often (enlarge factor is too low) or not often enough (enlarge factor is too high and thus the buffer dominates the container memory).
Because there is a bounded number of records that can persist in the buffer before getting flushed, we want to reduce the aggressiveness of the resizing algorithm the more records that have been processed.