[SPARK-6725] Model export/import for Pipeline API (Scala) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 2.0.0
Component/s: ML
Labels:
None

Target Version/s:

2.0.0

Description

This is an umbrella JIRA for adding model export/import to the spark.ml API. This JIRA is for adding the internal Saveable/Loadable API and Parquet-based format, not for other formats like PMML.

This will require the following steps:

Add export/import for all PipelineStages supported by spark.ml
- This will include some Transformers which are not Models.
- These can use almost the same format as the spark.mllib model save/load functions, but the model metadata must store a different class name (marking the class as a spark.ml class).
After all PipelineStages support save/load, add an interface which forces future additions to support save/load.

UPDATE: In spark.ml, we could save feature metadata using DataFrames. Other libraries and formats can support this, and it would be great if we could too. We could do either of the following:

save() optionally takes a dataset (or schema), and load will return a (model, schema) pair.
Models themselves save the input schema.

Both options would mean inheriting from new Saveable, Loadable types.

UPDATE: DESIGN DOC: Here's a design doc which I wrote. If you have comments about the planned implementation, please comment in this JIRA. Thanks! https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing

Attachments

Issue Links

is related to

SPARK-11994 Word2VecModel load and save cause SparkException when model is bigger than spark.kryoserializer.buffer.max

Resolved

SPARK-13265 Refactoring of basic ML import/export for other file system besides HDFS

Resolved

SPARK-4587 Model export/import

Resolved

SPARK-5874 How to improve the current ML pipeline API?

Resolved

SPARK-11939 PySpark support model export/import for Pipeline API

Resolved

relates to

SPARK-14311 Model persistence in SparkR 2.0

Resolved

(1 relates to)

Sub-Tasks

1.	Model export/import for spark.ml: LogisticRegression	Resolved	Joseph K. Bradley
2.	Model export/import for spark.ml: HashingTF	Closed	Unassigned
3.	Model export/import for spark.ml: Normalizer	Closed	Unassigned
4.	Model export/import for spark.ml: estimators under ml.feature (I)	Resolved	Xiangrui Meng
5.	Model export/import for spark.ml: Tokenizer	Closed	Unassigned
6.	Model export/import for spark.ml: ALS	Resolved	Joseph K. Bradley
7.	Model export/import for spark.ml: LinearRegression	Resolved	Wenjian Huang
8.	Model export/import for spark.ml: CrossValidator	Resolved	Joseph K. Bradley
9.	JSON serialization of standard params	Resolved	Xiangrui Meng
10.	Model import/export for non-meta estimators and transformers	Resolved	Xiangrui Meng
11.	Model export/import for spark.ml: Pipeline and PipelineModel	Resolved	Joseph K. Bradley
12.	Refactoring of basic ML import/export	Resolved	Joseph K. Bradley
13.	Refactoring to create template for Estimator, Model pairs	Resolved	Joseph K. Bradley
14.	JSON serialization of Param[Vector]	Resolved	Xiangrui Meng
15.	Model export/import for spark.ml: all basic Transformers	Resolved	Joseph K. Bradley
16.	Model export/import for spark.ml: estimators under ml.feature (II)	Resolved	Yanbo Liang
17.	Renames traits to avoid collision with java.util.* and add use default traits to simplify the impl	Resolved	Xiangrui Meng
18.	Cleanups to existing Readers and Writers	Resolved	Joseph K. Bradley
19.	Model export/import for spark.ml: AFTSurvivalRegression and IsotonicRegression	Resolved	Xusen Yin
20.	Model export/import for spark.ml: LDA	Resolved	yuhao yang
21.	Model export/import for spark.ml: k-means & naive Bayes	Resolved	Xusen Yin
22.	Model export/import for spark.ml: Multilayer Perceptron	Resolved	Xusen Yin
23.	Model export/import for spark.ml: DecisionTreeClassifier,Regressor	Resolved	Joseph K. Bradley
24.	Model export/import for RFormula and RFormulaModel	Resolved	Xusen Yin
25.	Model export/import for spark.ml: OneVsRest	Resolved	Xusen Yin
26.	Model export/import for spark.ml: TrainValidationSplit	Resolved	Xusen Yin
27.	Create user guide section explaining export/import	Resolved	Bill Chambers
28.	Model export/import for spark.ml: ElementwiseProduct	Resolved	Xusen Yin
29.	Model export/import for spark.ml: BisectingKMeans	Resolved	yuhao yang
30.	Model export/import for spark.ml: GBTs	Resolved	Yanbo Liang
31.	Model export/import for spark.ml: RandomForests	Resolved	Gayathri Murali

Activity

People

Assignee:: Joseph K. Bradley

Reporter:: Joseph K. Bradley

Votes:: 13 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 06/Apr/15 20:50

Updated:: 13/Apr/16 18:32

Resolved:: 13/Apr/16 18:32