Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32904

pyspark.mllib.evaluation.MulticlassMetrics needs to swap the results of precision( ) and recall( )

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 3.0.1
    • Fix Version/s: None
    • Component/s: MLlib
    • Labels:
      None

      Description

      https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/evaluation/MulticlassMetrics.html

      The values returned by the precision() and recall() methods of this API should be swapped.

      Following is the example results I got when I run this API. It prints out precision  
      metrics = MulticlassMetrics(predictionAndLabels)
      print (metrics.confusionMatrix().toArray())
      print ("precision: ",metrics.precision(1))
      print ("recall: ",metrics.recall(1))
      [[36631. 2845.]

      [ 3839. 1610.]]

      precision: 0.3613916947250281

      recall: 0.2954670581758121

       
      predictions.select('prediction').agg({'prediction':'sum'}).show()

      sum(prediction)  5449.0

      As you can see, my model predicted 5449 cases with label=1, and 1610 out of the 5449 cases are true positive, so precision should be  1610/5449=0.2954670581758121, but this API assigned the precision value to recall() method, which should be swapped. 

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tinaliendurance TinaLi

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment