Description
Following the example in this Databricks blog post under "Python tuning", I'm trying to save an ML Pipeline model.
This pipeline, however, includes a custom transformer. When I try to save the model, the operation fails because the custom transformer doesn't have a _to_java attribute.
Traceback (most recent call last): File ".../file.py", line 56, in <module> model.bestModel.save('model') File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 222, in save File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 217, in write File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/pyspark.zip/pyspark/ml/util.py", line 93, in __init__ File "/usr/local/Cellar/apache-spark/2.0.0/libexec/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 254, in _to_java AttributeError: 'PeoplePairFeaturizer' object has no attribute '_to_java'
Looking at the source code for ml/base.py, I see that not even the base Transformer class has such an attribute.
I'm assuming this is missing functionality that is intended to be patched up (i.e. like this).
I'm not sure if there is an existing JIRA for this (my searches didn't turn up clear results).
Attachments
Issue Links
- is related to
-
SPARK-21542 Helper functions for custom Python Persistence
- Resolved
-
SPARK-24632 Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence
- Open
-
SPARK-15574 Python meta-algorithms in Scala
- Resolved
- relates to
-
SPARK-11939 PySpark support model export/import for Pipeline API
- Resolved
- links to