[SPARK-29691] Estimator fit method fails to copy params (in PySpark) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4.4
Fix Version/s: 3.0.0
Component/s: PySpark
Labels:
None

Description

Estimator `fit` method is supposed to copy a dictionary of params, overwriting the estimator's previous values, before fitting the model. However, the parameter values are not updated. This was observed in PySpark, but may be present in the Java objects, as the PySpark code appears to be functioning correctly. (The copy method that interacts with Java is actually implemented in Params.)

For example, this prints

Before: 0.8
After: 0.8

but After should be 0.75

from pyspark.ml.classification import LogisticRegression

# Load training data
training = spark \
    .read \
    .format("libsvm") \
    .load("data/mllib/sample_multiclass_classification_data.txt")

lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
print("Before:", lr.getOrDefault("elasticNetParam"))

# Fit the model, but with an updated parameter setting:
lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})

print("After:", lr.getOrDefault("elasticNetParam"))

Attachments

Issue Links

links to

GitHub Pull Request #26527

Activity

People

Assignee:: John Bauer

Reporter:: John Bauer

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 31/Oct/19 20:35

Updated:: 19/Nov/19 22:18

Resolved:: 19/Nov/19 22:16