Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Invalid
-
None
-
None
-
None
Description
Once https://github.com/apache/arrow/pull/9043 is merged, we should extend this to wrap HashJoinExec and HashAggregateExec as well since they can both produce small batches.
Rather than hard-code a list of operators that need to be wrapped, we should find a more generic mechanism so that plans can declare if their input and/or output batches should be coalesced (similar to how we handle partitioning) and this would allow custom operators outside of DataFusion to benefit from this optimization.