Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently, our aggregations are made in a simple loop. However, as described here, horizontal operations can also be SIMDed, reports of 2.7x speedups.
The goal of this improvement is to support SIMD for the "sum", for primitive types.
The code to modify is in here. A good indication that this issue is completed is when the script
cargo bench --bench aggregate_kernels && cargo bench --bench aggregate_kernels --features simd
yields a speed-up.
Attachments
Issue Links
- links to