Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17025

Cannot persist PySpark ML Pipeline model that includes custom Transformer

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0
    • 2.3.0
    • ML, PySpark
    • None

    Description

      Following the example in this Databricks blog post under "Python tuning", I'm trying to save an ML Pipeline model.

      This pipeline, however, includes a custom transformer. When I try to save the model, the operation fails because the custom transformer doesn't have a _to_java attribute.

      Traceback (most recent call last):
        File ".../file.py", line 56, in <module>
          model.bestModel.save('model')
        File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 222, in save
        File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 217, in write
        File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/pyspark.zip/pyspark/ml/util.py", line 93, in __init__
        File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 254, in _to_java
      AttributeError: 'PeoplePairFeaturizer' object has no attribute '_to_java'
      

      Looking at the source code for ml/base.py, I see that not even the base Transformer class has such an attribute.

      I'm assuming this is missing functionality that is intended to be patched up (i.e. like this).

      I'm not sure if there is an existing JIRA for this (my searches didn't turn up clear results).

      Attachments

        Issue Links

          Activity

            People

              ajaysaini Ajay Saini
              nchammas Nicholas Chammas
              Votes:
              7 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: