Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4409

Additional (but limited) Linear Algebra Utils

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: MLlib
    • Labels:
      None
    • Target Version/s:

      Description

      This ticket is to discuss the addition of a very limited number of local matrix manipulation and generation methods that would be helpful in the further development for algorithms on top of BlockMatrix (SPARK-3974), such as Randomized SVD, and Multi Model Training (SPARK-1486).
      The proposed methods for addition are:
      For `Matrix`

      • map: maps the values in the matrix with a given function. Produces a new matrix.
      • update: the values in the matrix are updated with a given function. Occurs in place.

      Factory methods for `DenseMatrix`:

      • *zeros: Generate a matrix consisting of zeros
      • *ones: Generate a matrix consisting of ones
      • *eye: Generate an identity matrix
      • *rand: Generate a matrix consisting of i.i.d. uniform random numbers
      • *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
      • *diag: Generate a diagonal matrix from a supplied vector
        *These methods already exist in the factory methods for `Matrices`, however for cases where we require a `DenseMatrix`, you constantly have to add `.asInstanceOf[DenseMatrix]` everywhere, which makes the code "dirtier". I propose moving these functions to factory methods for `DenseMatrix` where the putput will be a `DenseMatrix` and the factory methods for `Matrices` will call these functions directly and output a generic `Matrix`.

      Factory methods for `SparseMatrix`:

      • speye: Identity matrix in sparse format. Saves a ton of memory when dimensions are large, especially in Multi Model Training, where each row requires being multiplied by a scalar.
      • sprand: Generate a sparse matrix with a given density consisting of i.i.d. uniform random numbers.
      • sprandn: Generate a sparse matrix with a given density consisting of i.i.d. gaussian random numbers.
      • diag: Generate a diagonal matrix from a supplied vector, but is memory efficient, because it just stores the diagonal. Again, very helpful in Multi Model Training.

      Factory methods for `Matrices`:

      • Include all the factory methods given above, but return a generic `Matrix` rather than `SparseMatrix` or `DenseMatrix`.
      • horzCat: Horizontally concatenate matrices to form one larger matrix. Very useful in both Multi Model Training, and for the repartitioning of BlockMatrix.
      • vertCat: Vertically concatenate matrices to form one larger matrix. Very useful for the repartitioning of BlockMatrix.

        Attachments

          Activity

            People

            • Assignee:
              brkyvz Burak Yavuz
              Reporter:
              brkyvz Burak Yavuz
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: