Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6725

Model export/import for Pipeline API (Scala)

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.3.0
    • 2.0.0
    • ML
    • None

    Description

      This is an umbrella JIRA for adding model export/import to the spark.ml API. This JIRA is for adding the internal Saveable/Loadable API and Parquet-based format, not for other formats like PMML.

      This will require the following steps:

      • Add export/import for all PipelineStages supported by spark.ml
        • This will include some Transformers which are not Models.
        • These can use almost the same format as the spark.mllib model save/load functions, but the model metadata must store a different class name (marking the class as a spark.ml class).
      • After all PipelineStages support save/load, add an interface which forces future additions to support save/load.

      UPDATE: In spark.ml, we could save feature metadata using DataFrames. Other libraries and formats can support this, and it would be great if we could too. We could do either of the following:

      • save() optionally takes a dataset (or schema), and load will return a (model, schema) pair.
      • Models themselves save the input schema.

      Both options would mean inheriting from new Saveable, Loadable types.

      UPDATE: DESIGN DOC: Here's a design doc which I wrote. If you have comments about the planned implementation, please comment in this JIRA. Thanks! https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              josephkb Joseph K. Bradley
              josephkb Joseph K. Bradley
              Votes:
              13 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: