Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9005

RegressionMetrics computing incorrect explainedVariance

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: MLlib
    • Labels:
      None
    • Target Version/s:

      Description

      RegressionMetrics currently computes explainedVariance using summary.variance(1) (variance of the residuals) where the Wikipedia definition uses the residual sum of squares math.pow(summary.normL2(1), 2). The two coincide only when the predictor is unbiased (e.g. an intercept term is included in a linear model), but this is not always the case. We should change to be consistent.

        Attachments

          Activity

            People

            • Assignee:
              fliang Feynman Liang
              Reporter:
              fliang Feynman Liang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: