Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
We have a lot of great benchmarks on individual kernels but I don't know if we have enough benchmarks on the function & expression evaluation itself.
There are some benchmarks (function_benchmark.cc) which measure this but I would like to get a better sense of a "bytes per second" number from the function system for a trivial function (e.g. an identity function that simply returns the values as is).
In addition, we should measure overhead for common tasks like preallocation, etc.
I would also like these benchmarks to be parameterized by batch size. Being able to run on small batches enables (in theory) better cache utilization. I suspect that the overhead here may start to become a bottleneck.
Attachments
Issue Links
- links to