Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
In DataFusion we are using Array.slice since https://github.com/apache/arrow/pull/9271 to pass data into the accumulators, instead of having the overhead of building arrays (possibly with few rows) at once.
However, it seems pretty inefficient by now (taking a 1/6 of instructions for hash aggregates) doing some allocations under the hood instead of the promised "zero copy", much more than for example take which copies / shuffles the entire array based on indices.
Yes, slicing is suboptimal atm. Also, IMO it should not be the Array to implement that method, but each implementation individually. I haven't touch that part yet, though.