Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32904

pyspark.mllib.evaluation.MulticlassMetrics needs to swap the results of precision( ) and recall( )

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.0.1
    • None
    • MLlib
    • None

    Description

      https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/evaluation/MulticlassMetrics.html

      The values returned by the precision() and recall() methods of this API should be swapped.

      Following is the example results I got when I run this API. It prints out precision  
      metrics = MulticlassMetrics(predictionAndLabels)
      print (metrics.confusionMatrix().toArray())
      print ("precision: ",metrics.precision(1))
      print ("recall: ",metrics.recall(1))
      [[36631. 2845.]

      [ 3839. 1610.]]

      precision: 0.3613916947250281

      recall: 0.2954670581758121

       
      predictions.select('prediction').agg({'prediction':'sum'}).show()

      sum(prediction)  5449.0

      As you can see, my model predicted 5449 cases with label=1, and 1610 out of the 5449 cases are true positive, so precision should be  1610/5449=0.2954670581758121, but this API assigned the precision value to recall() method, which should be swapped. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            tinaliendurance TinaLi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: