Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10875

RowMatrix.computeCovariance() result is not exactly symmetric

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.5.0
    • Fix Version/s: 1.6.0
    • Component/s: MLlib
    • Labels:
      None
    • Target Version/s:

      Description

      For some matrices, I have seen that the computed covariance matrix is not exactly symmetric, most likely due to some numerical rounding errors. This is problematic when trying to construct an instance of MultivariateGaussian, because it requires an exactly symmetric covariance matrix. See reproducible example below.

      I would suggest modifying the implementation so that G(i, j) and G(j, i) are set at the same time, with the same value.

      val rdd = RandomRDDs.normalVectorRDD(sc, 100, 10, 0, 0)
      val matrix = new RowMatrix(rdd)
      val mean = matrix.computeColumnSummaryStatistics().mean
      val cov = matrix.computeCovariance()
      val dist = new MultivariateGaussian(mean, cov) //throws breeze.linalg.MatrixNotSymmetricException
      

        Attachments

          Activity

            People

            • Assignee:
              pnpritchard Nick Pritchard
              Reporter:
              pnpritchard Nick Pritchard
              Shepherd:
              Xiangrui Meng
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: