[SPARK-20602] Adding LBFGS optimizer and Squared_hinge loss for LinearSVC - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Currently LinearSVC in Spark only supports OWLQN as the optimizer ( check https://issues.apache.org/jira/browse/SPARK-14709). I made comparison between LBFGS and OWLQN on several public dataset and found LBFGS converges much faster for LinearSVC in most cases.

The following table presents the number of training iterations and f1 score of both optimizers until convergence

Dataset	LBFGS with hinge	OWLQN with hinge	LBFGS with squared_hinge
news20.binary	31 (0.99)	413(0.99)	185 (0.99)
mushroom	28(1.0)	170(1.0)	24(1.0)
madelon	143(0.75)	8129(0.70)	823(0.74)
breast-cancer-scale	15(1.0)	16(1.0)	15 (1.0)
phishing	329(0.94)	231(0.94)	67 (0.94)
a1a(adult)	466 (0.87)	282 (0.87)	77 (0.86)
a7a	237 (0.84)	372(0.84)	69(0.84)

data source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
training code: new LinearSVC().setMaxIter(10000).setTol(1e-6)

LBFGS requires less iterations in most cases (except for a1a) and probably is a better default optimizer.

Attachments

Issue Links

relates to

SPARK-20503 ML 2.2 QA: API: Python API coverage

Resolved

SPARK-20348 Support squared hinge loss (L2 loss) for LinearSVC

Resolved

links to

[Github] Pull Request #17862 (hhbyyh)

GitHub Pull Request #17862

Activity

People

Assignee:: yuhao yang

Reporter:: yuhao yang

Shepherd:: Yanbo Liang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 04/May/17 19:15

Updated:: 25/May/21 01:51

Resolved:: 25/May/21 01:41