Description
When fitting a PySpark Pipeline with no stages, it should work as an identity transformer. Instead the following error is raised:
Traceback (most recent call last): File "./spark/python/pyspark/ml/base.py", line 64, in fit return self._fit(dataset) File "./spark/python/pyspark/ml/pipeline.py", line 99, in _fit for stage in stages: TypeError: 'NoneType' object is not iterable
The param stages needs to be an empty list and getStages should call getOrDefault.
Also, since the default value is None is then changed to and empty list [], this never changes the value if passed in as a keyword argument. Instead, the kwargs value should be changed directly if stages is None.
For example
if stages is None: stages = []
should be this
if stages is None: kwargs['stages'] = []
However, since there is no default value in the Scala implementation, assigning a default here is not needed and should be cleaned up. The pydocs should better indicate that stages is required to be a list.
Attachments
Issue Links
- Is contained by
-
SPARK-14771 Python ML Param and UID issues
- Resolved
- links to