Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Our sparse-dense matrix multiply is already cache conscious but used very small block static block sizes, which were optimized for moderate sparsity. However, for cases with very sparse matrices (and skinny right hand size matrices), the small block sizes add substantial overhead of more than an order of magnitude. This task aims to make these block sizes adaptive, consistent with our cache-conscious implementations of sparsity exploiting matrix multiply operators such as wsloss.