[SPARK-17721] Erroneous computation in multiplication of transposed SparseMatrix with SparseVector - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.4.1, 1.5.2, 1.6.2, 2.0.0
Fix Version/s: 1.5.3, 1.6.3, 2.0.2, 2.1.0
Component/s: ML, MLlib
Labels:
- correctness
Environment:

Verified on OS X with Spark 1.6.1 and on Databricks running Spark 1.6.1

Target Version/s:

1.5.3, 1.6.3, 2.0.2, 2.1.0

Description

There is a bug in how a transposed SparseMatrix (isTransposed=true) does multiplication with a SparseVector. The bug is present (for v. > 2.0.0) in both org.apache.spark.mllib.linalg.BLAS (mllib) and org.apache.spark.ml.linalg.BLAS (mllib-local) in the private gemv method with signature:

gemv(alpha: Double, A: SparseMatrix, x: SparseVector, beta: Double, y: DenseVector).

This bug can be verified by running the following snippet in a Spark shell (here using v1.6.1):

import com.holdenkarau.spark.testing.SharedSparkContext
import org.apache.spark.mllib.linalg._

val A = Matrices.dense(3, 2, Array[Double](0, 2, 1, 1, 2, 0)).asInstanceOf[DenseMatrix].toSparse.transpose
val b = Vectors.sparse(3, Seq[(Int, Double)]((1, 2), (2, 1))).asInstanceOf[SparseVector]

A.multiply(b)
A.multiply(b.toDense)

The first multiply with the SparseMatrix returns the incorrect result:

org.apache.spark.mllib.linalg.DenseVector = [5.0,0.0]

whereas the correct result is returned by the second multiply:

org.apache.spark.mllib.linalg.DenseVector = [5.0,4.0]

Attachments

Issue Links

links to

[Github] Pull Request #15296 (bwahlgreen)

[Github] Pull Request #15311 (bwahlgreen)

Activity

People

Assignee:: Bjarne Fruergaard

Reporter:: Bjarne Fruergaard

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Sep/16 09:22

Updated:: 02/Oct/16 02:30

Resolved:: 02/Oct/16 02:29