[SPARK-4409] Additional (but limited) Linear Algebra Utils - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0
Component/s: MLlib
Labels:
None

Target Version/s:

1.3.0

Description

This ticket is to discuss the addition of a very limited number of local matrix manipulation and generation methods that would be helpful in the further development for algorithms on top of BlockMatrix (~~SPARK-3974~~), such as Randomized SVD, and Multi Model Training (~~SPARK-1486~~).
The proposed methods for addition are:
For `Matrix`

map: maps the values in the matrix with a given function. Produces a new matrix.
update: the values in the matrix are updated with a given function. Occurs in place.

Factory methods for `DenseMatrix`:

*zeros: Generate a matrix consisting of zeros
*ones: Generate a matrix consisting of ones
*eye: Generate an identity matrix
*rand: Generate a matrix consisting of i.i.d. uniform random numbers
*randn: Generate a matrix consisting of i.i.d. gaussian random numbers
*diag: Generate a diagonal matrix from a supplied vector
*These methods already exist in the factory methods for `Matrices`, however for cases where we require a `DenseMatrix`, you constantly have to add `.asInstanceOf[DenseMatrix]` everywhere, which makes the code "dirtier". I propose moving these functions to factory methods for `DenseMatrix` where the putput will be a `DenseMatrix` and the factory methods for `Matrices` will call these functions directly and output a generic `Matrix`.

Factory methods for `SparseMatrix`:

speye: Identity matrix in sparse format. Saves a ton of memory when dimensions are large, especially in Multi Model Training, where each row requires being multiplied by a scalar.
sprand: Generate a sparse matrix with a given density consisting of i.i.d. uniform random numbers.
sprandn: Generate a sparse matrix with a given density consisting of i.i.d. gaussian random numbers.
diag: Generate a diagonal matrix from a supplied vector, but is memory efficient, because it just stores the diagonal. Again, very helpful in Multi Model Training.

Factory methods for `Matrices`:

Include all the factory methods given above, but return a generic `Matrix` rather than `SparseMatrix` or `DenseMatrix`.
horzCat: Horizontally concatenate matrices to form one larger matrix. Very useful in both Multi Model Training, and for the repartitioning of BlockMatrix.
vertCat: Vertically concatenate matrices to form one larger matrix. Very useful for the repartitioning of BlockMatrix.

Attachments

Issue Links

links to

[Github] Pull Request #3319 (brkyvz)

Activity

People

Assignee:: Burak Yavuz

Reporter:: Burak Yavuz

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Nov/14 19:44

Updated:: 29/Dec/14 21:24

Resolved:: 29/Dec/14 21:24