[SPARK-6682] Deprecate static train and use builder instead for Scala/Java - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: MLlib
Labels:
None

Description

In MLlib, we have for some time been unofficially moving away from the old static train() methods and moving towards builder patterns. This JIRA is to discuss this move and (hopefully) make it official.

"Old static train()" API:

val myModel = NaiveBayes.train(myData, ...)

"New builder pattern" API:

val nb = new NaiveBayes().setLambda(0.1)
val myModel = nb.train(myData)

Pros of the builder pattern:

Much less code when algorithms have many parameters. Since Java does not support default arguments, we required many duplicated static train() methods (for each prefix set of arguments).
Helps to enforce default parameters. Users should ideally not have to even think about setting parameters if they just want to try an algorithm quickly.
Matches spark.ml API

Cons of the builder pattern:

In Python APIs, static train methods are more "Pythonic."

Proposal:

Scala/Java: We should start deprecating the old static train() methods. We must keep them for API stability, but deprecating will help with API consistency, making it clear that everyone should use the builder pattern. As we deprecate them, we should make sure that the builder pattern supports all parameters.
Python: Keep static train methods.

CC: mengxr

Attachments

Issue Links

is blocked by

SPARK-5256 Improving MLlib optimization APIs

Resolved

SPARK-18303 CLONE - Improving MLlib optimization APIs

Resolved

is related to

SPARK-7134 Add regParam and featureScaling options to Logistic regression 'train' methods

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 02/Apr/15 18:21

Updated:: 07/Nov/16 18:03

Resolved:: 21/Apr/16 23:50