Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17906

MulticlassClassificationEvaluator support target label

    XMLWordPrintableJSON

Details

    • Brainstorming
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • None
    • None
    • ML
    • None

    Description

      In practice, I sometime only focus on metric of one special label.
      For example, in CTR prediction, I usually only mind F1 of positive class.

      In sklearn, this is supported:

      >>> from sklearn.metrics import classification_report
      >>> y_true = [0, 1, 2, 2, 2]
      >>> y_pred = [0, 0, 2, 2, 1]
      >>> target_names = ['class 0', 'class 1', 'class 2']
      >>> print(classification_report(y_true, y_pred, target_names=target_names))
                   precision    recall  f1-score   support
      
          class 0       0.50      1.00      0.67         1
          class 1       0.00      0.00      0.00         1
          class 2       1.00      0.67      0.80         3
      
      avg / total       0.70      0.60      0.61         5
      

      Now, ml only support `weightedXXX`. So I think there may be a point to improve.

      The API may be designed like this:

      val dataset = ...
      val evaluator = new MulticlassClassificationEvaluator
      evaluator.setMetricName("f1")
      evaluator.evaluate(dataset)       // weightedF1 of all classes
      
      evaluator.setTarget(0.0).setMetricName("f1")
      evaluator.evaluate(dataset)       // F1 of class "0"
      

      what's your opinion? yanboliangjosephkbsethahsrowen
      If this is useful and acceptable, I'm happy to work on this.

      Attachments

        Activity

          People

            Unassigned Unassigned
            podongfeng Ruifeng Zheng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: