Uploaded image for project: 'Hivemall'
  1. Hivemall
  2. HIVEMALL-78

AUC UDAF for BinaryClassificationMetrics

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Labels:
      None

      Description

      Support Area Under ROC of Binary Classification Metrics.

      -- option 1
      select
         auroc(label, prob) as auroc
      from
         data;
      
      -- option 2
      WITH roc as (
        select
          roc(label, prob) as tpr, tpr
        from
          data
      )
      select
        auc(fpr, tpr) as auroc -- auc is UDAF, input is sorted by fp asc
      from (  
        select 
          fpr, tpr
        from
          roc
        DISTRIBUTE BY 
          floor(fpr / 0.2) -- 5 bins
        SORT BY 
          fpr ASC
      ) t
      

      Reference)
      http://www.citeulike.org/user/myui/article/12615084
      https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
      https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                takuti Takuya Kitazawa
                Reporter:
                myui Makoto Yui
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: