Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently KernelExecutor handles preallocation of null bitmaps and other buffers based on simple flags on each Kernel. This is not very flexible and we end up leaving a lot of performance on the table in cases where we can preallocate but the behavior can't be captured in the available flags. For example, in the case of binary_string_join_element_wise, it would be possible to preallocate all buffers (even the character buffer) and write output into slices.
Having this as a public function would enable us to unit test it directly (currently Executors are only tested indirectly through calling of compute::Functions) and reuse it, for example to correctly preallocate a small temporary for pipelined execution
One way this could be added is as a new method on each Kernel:
// Output preallocated Datums sufficient for execution of the kernel on each ExecBatch. // The output Datums may not be identically chunked to the input batches, for example // kernels which support contiguous output preallocation will preallocate a single Datum // (and can then output into slices of that Datum). Result<std::vector<Datum>> Kernel::prepare_output( const Kernel*, KernelContext*, const std::vector<ExecBatch>& inputs)
Attachments
Issue Links
- is a child of
-
ARROW-8894 [C++] C++ array kernels framework and execution buildout (umbrella issue)
- Open
- is related to
-
ARROW-16758 [C++] Rewrite ExecuteScalarExpression to not use ScalarExecutor
- Open
- relates to
-
ARROW-16755 [C++] Improve array expression and kernel evaluation performance on small inputs
- Open
-
ARROW-11647 [C++][Compute] CastFromNull does not use preallocated buffers
- Open