[SPARK-14311] Model persistence in SparkR 2.0 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: ML, SparkR
Labels:
None

Target Version/s:

2.0.0

Description

In Spark 2.0, we are going to have 4 ML models in SparkR: GLMs, k-means, naive Bayes, and AFT survival regression. Users can fit models, get summary, and make predictions. However, they cannot save/load the models yet.

ML models in SparkR are wrappers around ML pipelines. So it should be straightforward to implement model persistence. We need to think more about the API. R uses save/load for objects and datasets (also objects). It is possible to overload save for ML models, e.g., save.NaiveBayesWrapper. But I'm not sure whether load can be overloaded easily. I propose the following API:

model <- glm(formula, data = df)
ml.save(model, path, mode = "overwrite")
model2 <- ml.load(path)

We defined wrappers as S4 classes. So `ml.save` is an S4 method and ml.load is a S3 method (correct me if I'm wrong).

Attachments

Issue Links

is related to

SPARK-6725 Model export/import for Pipeline API (Scala)

Resolved

relates to

SPARK-14831 Make ML APIs in SparkR consistent

Resolved

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Xiangrui Meng

Reporter:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 31/Mar/16 21:56

Updated:: 30/Apr/16 03:59

Resolved:: 30/Apr/16 03:59