Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11581

Example mllib code in documentation incorrectly computes MSE

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.3.1, 1.4.1, 1.5.1, 1.6.0
    • Fix Version/s: 1.4.2, 1.5.3, 1.6.0
    • Component/s: Documentation
    • Labels:

      Description

      The example Java code at the bottom of the mllib-decision-tree web page shows how to compute MSE on the test data. However, there is a bug in the code. The code currently divides by data.count(), but it should instead divide by the count of testData, testData.count().

      http://spark.apache.org/docs/latest/mllib-decision-tree.html

      Double testMSE =
      predictionAndLabel.map(new Function<Tuple2<Double, Double>, Double>() {
      @Override
      public Double call(Tuple2<Double, Double> pl)

      { Double diff = pl._1() - pl._2(); return diff * diff; }

      }).reduce(new Function2<Double, Double, Double>() {
      @Override
      public Double call(Double a, Double b)

      { return a + b; }

      }) / data.count();
      System.out.println("Test Mean Squared Error: " + testMSE);

        Attachments

          Activity

            People

            • Assignee:
              bharat M Bharat lal
              Reporter:
              bwebb Brian Webb
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: