[SPARK-6705] MLLIB ML Pipeline's Logistic Regression has no intercept term - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: ML, MLlib
Labels:
None

Description

Currently, the ML Pipeline's LogisticRegression.scala file does not allow setting whether or not to fit an intercept term. Therefore, the pipeline defers to LogisticRegressionWithLBFGS which does not use an intercept term. This makes sense from a performance point of view because adding an intercept term requires memory allocation.

However, this is undesirable statistically, since the statistical default is usually to include an intercept term, and one needs to have a very strong
reason for not having an intercept term.

Explicitly modeling the intercept by adding a column of all 1s does not
work because LogisticRegressionWithLBFGS forces column normalization, and a column of all 1s has 0 variance and so dividing by 0 kills it.

We should open up the API for the ML Pipeline to explicitly allow controlling whether or not to fit an intercept.

Attachments

Issue Links

links to

[Github] Pull Request #5301 (oefirouz)

Activity

People

Assignee:: Omede Firouz

Reporter:: Omede Firouz

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Apr/15 23:35

Updated:: 08/Apr/15 19:31

Resolved:: 08/Apr/15 03:37