[SPARK-8522] Disable feature scaling in Linear and Logistic Regression - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: ML
Labels:
None

Description

All compressed sensing applications, and some of the regression use-cases will have better result by turning the feature scaling off. However, if we implement this naively by training the dataset without doing any standardization, the rate of convergency will not be good. This can be implemented by still standardizing the training dataset but we penalize each component differently to get effectively the same objective function but a better numerical problem. As a result, for those columns with high variances, they will be penalized less, and vice versa. Without this, since all the features are standardized, so they will be penalized the same.

In R, there is an option for this.
`standardize`
Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize. See details below for y standardization with family="gaussian".

Attachments

Issue Links

contains

SPARK-6683 Handling feature scaling properly for GLMs

Resolved

links to

[Github] Pull Request #7024 (holdenk)

Sub-Tasks

1.	Add a param for disabling of feature scaling, default to true	Resolved	Holden Karau
2.	Add an option to disable feature scaling in Linear Regression	Resolved	Holden Karau
3.	Disable feature scaling in Logistic Regression	Resolved	DB Tsai

Activity

People

Assignee:: DB Tsai

Reporter:: DB Tsai

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 22/Jun/15 07:23

Updated:: 05/Aug/15 01:32

Resolved:: 05/Aug/15 01:32