[SPARK-17476] Proper handling for unseen labels in logistic regression training. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Now that logistic regression supports multiclass, it is possible to train on data that has K classes, but one or more of the classes does not appear in training. For example,

(0.0, x1)
(2.0, x2)
...

Currently, logistic regression assumes that the outcome classes in the above dataset have three levels: 0, 1, 2. Since label 1 never appears, it should never be predicted. In theory, the coefficients should be zero and the intercept should be negative infinity. This can cause problems since we center the intercepts after training.

We should discuss whether or not the intercepts actually tend to -infinity in practice, and whether or not we should even include them in training.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Seth Hendrickson

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Sep/16 18:10

Updated:: 21/May/19 04:33

Resolved:: 21/May/19 04:33