Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17721

Erroneous computation in multiplication of transposed SparseMatrix with SparseVector

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.4.1, 1.5.2, 1.6.2, 2.0.0
    • 1.5.3, 1.6.3, 2.0.2, 2.1.0
    • ML, MLlib
    • Verified on OS X with Spark 1.6.1 and on Databricks running Spark 1.6.1

    Description

      There is a bug in how a transposed SparseMatrix (isTransposed=true) does multiplication with a SparseVector. The bug is present (for v. > 2.0.0) in both org.apache.spark.mllib.linalg.BLAS (mllib) and org.apache.spark.ml.linalg.BLAS (mllib-local) in the private gemv method with signature:

      gemv(alpha: Double, A: SparseMatrix, x: SparseVector, beta: Double, y: DenseVector).

      This bug can be verified by running the following snippet in a Spark shell (here using v1.6.1):

      import com.holdenkarau.spark.testing.SharedSparkContext
      import org.apache.spark.mllib.linalg._
      
      val A = Matrices.dense(3, 2, Array[Double](0, 2, 1, 1, 2, 0)).asInstanceOf[DenseMatrix].toSparse.transpose
      val b = Vectors.sparse(3, Seq[(Int, Double)]((1, 2), (2, 1))).asInstanceOf[SparseVector]
      
      A.multiply(b)
      A.multiply(b.toDense)
      

      The first multiply with the SparseMatrix returns the incorrect result:

      org.apache.spark.mllib.linalg.DenseVector = [5.0,0.0]
      

      whereas the correct result is returned by the second multiply:

      org.apache.spark.mllib.linalg.DenseVector = [5.0,4.0]
      

      Attachments

        Activity

          People

            bfruergaard Bjarne Fruergaard
            bfruergaard Bjarne Fruergaard
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: