Details

New Feature

Status: Resolved

Minor

Resolution: Incomplete

2.4.5

None
Description
MLlib does not currently support multiplication of sparse matrices. When multiplying block matrices with sparse blocks, the sparse blocks are first converted to dense matrices. This leads to large increases in memory utilization for certain problems.
I'd like to propose adding support for local sparse matrix multiplication to MLlib, as well as local densesparse matrix multiplication. With these changes, the case clause which converts sparse blocks to dense matrices in the block matrix multiply method could be removed.
One question is whether the result of sparsesparse matrix multiplication should be sparse or dense, since the product of two sparse matrices can be quite dense depending on the matrices. I propose returning a sparse matrix, however, and letting the application convert the result to a dense matrix if necessary. There is some precedent for this with the block matrix add method, which returns sparse matrix blocks even when adding a sparse matrix block to a dense matrix block.
As for the implementation, one option would be to leverage Breeze's existing sparse matrix multiplication, as MLlib currently does for matrix addition. Another would be to add support for sparsesparse multiplication to the BLAS wrapper, which would be consistent with the sparsedense multiplication implementation and could support a more efficient routine for transposed matrices (as Breeze does not support transposed matrices). The exact algorithm would follow that laid out in "Sparse Matrix Multiplication Package (SMMP)".
This would likely not be a huge change but would take some time to test and benchmark properly, so before I put up a code diff I would be curious to know:
 Is there any interest in supporting this functionality in MLlib?
 Is there a preference for the return type of sparsesparse multiplication? (i.e. sparse or dense)
 Is there a preference for the implementation? (Breeze vs a builtin one)
Some tickets which include related functionality or identified this particular issue but never solved it: SPARK16820, SPARK3418.