[SPARK-6332] compute calibration curve for binary classifiers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: MLlib
Labels:
- classification

Description

For binary classifiers, calibration measures how classifier scores compare to the proportion of positive examples. If the classifier is well-calibrated, the classifier score is approximately equal to the proportion of positive examples. This is important if the scores are used as probabilities for making decisions via expected cost. Otherwise, the calibration curve may still be interesting; the proportion of positive examples should at least be a monotonic function of the score.

I propose that a new method for calibration be added to the class BinaryClassificationMetrics, since calibration seems to fit in with the ROC curve and other classifier assessments.

For more about calibration, see: http://en.wikipedia.org/wiki/Calibration_%28statistics%29#In_classification

References:

Mahdi Pakdaman Naeini, Gregory F. Cooper, Milos Hauskrecht. "Binary Classifier Calibration: Non-parametric approach." http://arxiv.org/abs/1401.3390

Alexandru Niculescu-Mizil, Rich Caruana. "Predicting Good Probabilities With Supervised Learning." Appearing in Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 2005. http://www.cs.cornell.edu/~alexn/papers/calibration.icml05.crc.rev3.pdf

"Properties and benefits of calibrated classifiers." Ira Cohen, Moises Goldszmidt. http://www.hpl.hp.com/techreports/2004/HPL-2004-22R1.pdf

Attachments

Issue Links

links to

[Github] Pull Request #5025 (robert-dodier)

[Github] Pull Request #10666 (robert-dodier)

Activity

People

Assignee:: Unassigned

Reporter:: Robert Dodier

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/Mar/15 01:53

Updated:: 16/Jun/16 15:19

Resolved:: 16/Jun/16 15:19