Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6725

Model export/import for Pipeline API (Scala)

    Details

    • Type: Umbrella
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 2.0.0
    • Component/s: ML
    • Labels:
      None
    • Target Version/s:

      Description

      This is an umbrella JIRA for adding model export/import to the spark.ml API. This JIRA is for adding the internal Saveable/Loadable API and Parquet-based format, not for other formats like PMML.

      This will require the following steps:

      • Add export/import for all PipelineStages supported by spark.ml
        • This will include some Transformers which are not Models.
        • These can use almost the same format as the spark.mllib model save/load functions, but the model metadata must store a different class name (marking the class as a spark.ml class).
      • After all PipelineStages support save/load, add an interface which forces future additions to support save/load.

      UPDATE: In spark.ml, we could save feature metadata using DataFrames. Other libraries and formats can support this, and it would be great if we could too. We could do either of the following:

      • save() optionally takes a dataset (or schema), and load will return a (model, schema) pair.
      • Models themselves save the input schema.

      Both options would mean inheriting from new Saveable, Loadable types.

      UPDATE: DESIGN DOC: Here's a design doc which I wrote. If you have comments about the planned implementation, please comment in this JIRA. Thanks! https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing

        Attachments

          Issue Links

          1.
          Model export/import for spark.ml: LogisticRegression Sub-task Resolved Joseph K. Bradley
          2.
          Model export/import for spark.ml: HashingTF Sub-task Closed Unassigned
          3.
          Model export/import for spark.ml: Normalizer Sub-task Closed Unassigned
          4.
          Model export/import for spark.ml: estimators under ml.feature (I) Sub-task Resolved Xiangrui Meng
          5.
          Model export/import for spark.ml: Tokenizer Sub-task Closed Unassigned
          6.
          Model export/import for spark.ml: ALS Sub-task Resolved Joseph K. Bradley
          7.
          Model export/import for spark.ml: LinearRegression Sub-task Resolved Wenjian Huang
          8.
          Model export/import for spark.ml: CrossValidator Sub-task Resolved Joseph K. Bradley
          9.
          JSON serialization of standard params Sub-task Resolved Xiangrui Meng
          10.
          Model import/export for non-meta estimators and transformers Sub-task Resolved Xiangrui Meng
          11.
          Model export/import for spark.ml: Pipeline and PipelineModel Sub-task Resolved Joseph K. Bradley
          12.
          Refactoring of basic ML import/export Sub-task Resolved Joseph K. Bradley
          13.
          Refactoring to create template for Estimator, Model pairs Sub-task Resolved Joseph K. Bradley
          14.
          JSON serialization of Param[Vector] Sub-task Resolved Xiangrui Meng
          15.
          Model export/import for spark.ml: all basic Transformers Sub-task Resolved Joseph K. Bradley
          16.
          Model export/import for spark.ml: estimators under ml.feature (II) Sub-task Resolved Yanbo Liang
          17.
          Renames traits to avoid collision with java.util.* and add use default traits to simplify the impl Sub-task Resolved Xiangrui Meng
          18.
          Cleanups to existing Readers and Writers Sub-task Resolved Joseph K. Bradley
          19.
          Model export/import for spark.ml: AFTSurvivalRegression and IsotonicRegression Sub-task Resolved Xusen Yin
          20.
          Model export/import for spark.ml: LDA Sub-task Resolved yuhao yang
          21.
          Model export/import for spark.ml: k-means & naive Bayes Sub-task Resolved Xusen Yin
          22.
          Model export/import for spark.ml: Multilayer Perceptron Sub-task Resolved Xusen Yin
          23.
          Model export/import for spark.ml: DecisionTreeClassifier,Regressor Sub-task Resolved Joseph K. Bradley
          24.
          Model export/import for RFormula and RFormulaModel Sub-task Resolved Xusen Yin
          25.
          Model export/import for spark.ml: OneVsRest Sub-task Resolved Xusen Yin
          26.
          Model export/import for spark.ml: TrainValidationSplit Sub-task Resolved Xusen Yin
          27.
          Create user guide section explaining export/import Sub-task Resolved Bill Chambers
          28.
          Model export/import for spark.ml: ElementwiseProduct Sub-task Resolved Xusen Yin
          29.
          Model export/import for spark.ml: BisectingKMeans Sub-task Resolved yuhao yang
          30.
          Model export/import for spark.ml: GBTs Sub-task Resolved Yanbo Liang
          31.
          Model export/import for spark.ml: RandomForests Sub-task Resolved Gayathri Murali

            Activity

              People

              • Assignee:
                josephkb Joseph K. Bradley
                Reporter:
                josephkb Joseph K. Bradley
              • Votes:
                13 Vote for this issue
                Watchers:
                33 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: