[SPARK-31122] Add support for sparse matrix multiplication - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.4.5
Fix Version/s: None
Component/s: MLlib
Labels:
- bulk-closed

Description

MLlib does not currently support multiplication of sparse matrices. When multiplying block matrices with sparse blocks, the sparse blocks are first converted to dense matrices. This leads to large increases in memory utilization for certain problems.

I'd like to propose adding support for local sparse matrix multiplication to MLlib, as well as local dense-sparse matrix multiplication. With these changes, the case clause which converts sparse blocks to dense matrices in the block matrix multiply method could be removed.

One question is whether the result of sparse-sparse matrix multiplication should be sparse or dense, since the product of two sparse matrices can be quite dense depending on the matrices. I propose returning a sparse matrix, however, and letting the application convert the result to a dense matrix if necessary. There is some precedent for this with the block matrix add method, which returns sparse matrix blocks even when adding a sparse matrix block to a dense matrix block.

As for the implementation, one option would be to leverage Breeze's existing sparse matrix multiplication, as MLlib currently does for matrix addition. Another would be to add support for sparse-sparse multiplication to the BLAS wrapper, which would be consistent with the sparse-dense multiplication implementation and could support a more efficient routine for transposed matrices (as Breeze does not support transposed matrices). The exact algorithm would follow that laid out in "Sparse Matrix Multiplication Package (SMMP)".

This would likely not be a huge change but would take some time to test and benchmark properly, so before I put up a code diff I would be curious to know:

Is there any interest in supporting this functionality in MLlib?
Is there a preference for the return type of sparse-sparse multiplication? (i.e. sparse or dense)
Is there a preference for the implementation? (Breeze vs a built-in one)

Some tickets which include related functionality or identified this particular issue but never solved it: ~~SPARK-16820~~, ~~SPARK-3418~~.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Alex Favaro

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Mar/20 12:01

Updated:: 25/May/21 01:52

Resolved:: 25/May/21 01:39