Details
-
Improvement
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
0.11.1, 0.12.0, 0.11.2
-
None
-
None
Description
As we know, we are still struggling with decisions which path to take for bare metal accelerations in in-core math.
Meanwhile, a simple no-brainer improvement though is to add decision paths and apply multithreaded matrix-matrix multiplication (and maybe even others; but mmul perhaps is the most prominent beneficiary here at the moment which is both easy to do and to have a statistically significant improvement)
So multithreaded logic addition to mmul is one path.
Another path is automatic adjustment of multithreading.
In front end, we probably want to utilize all cores available.
in the backend, we can oversubscribe cores but probably doing so by more than 2x or 3x is unadvisable because of point of diminishing returns driven by growing likelihood of context switching overhead.