[SPARK-15018] PySpark ML Pipeline raises unclear error when no stages set - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.0
Component/s: ML, PySpark
Labels:
None

Description

When fitting a PySpark Pipeline with no stages, it should work as an identity transformer. Instead the following error is raised:

Traceback (most recent call last):
  File "./spark/python/pyspark/ml/base.py", line 64, in fit
    return self._fit(dataset)
  File "./spark/python/pyspark/ml/pipeline.py", line 99, in _fit
    for stage in stages:
TypeError: 'NoneType' object is not iterable

The param stages needs to be an empty list and getStages should call getOrDefault.

Also, since the default value is None is then changed to and empty list [], this never changes the value if passed in as a keyword argument. Instead, the kwargs value should be changed directly if stages is None.

For example

if stages is None:
    stages = []

should be this

if stages is None:
    kwargs['stages'] = []

However, since there is no default value in the Scala implementation, assigning a default here is not needed and should be cleaned up. The pydocs should better indicate that stages is required to be a list.

Attachments

Issue Links

Is contained by

SPARK-14771 Python ML Param and UID issues

Resolved

links to

[Github] Pull Request #12790 (BryanCutler)

Activity

People

Assignee:: Bryan Cutler

Reporter:: Bryan Cutler

Shepherd:: Yanbo Liang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Apr/16 22:07

Updated:: 20/Aug/16 06:47

Resolved:: 20/Aug/16 06:47