Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29691

Estimator fit method fails to copy params (in PySpark)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.4
    • 3.0.0
    • PySpark
    • None

    Description

      Estimator `fit` method is supposed to copy a dictionary of params, overwriting the estimator's previous values, before fitting the model. However, the parameter values are not updated. This was observed in PySpark, but may be present in the Java objects, as the PySpark code appears to be functioning correctly. (The copy method that interacts with Java is actually implemented in Params.)

      For example, this prints

      Before: 0.8
      After: 0.8

      but After should be 0.75

      from pyspark.ml.classification import LogisticRegression
      
      # Load training data
      training = spark \
          .read \
          .format("libsvm") \
          .load("data/mllib/sample_multiclass_classification_data.txt")
      
      lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
      print("Before:", lr.getOrDefault("elasticNetParam"))
      
      # Fit the model, but with an updated parameter setting:
      lrModel = lr.fit(training, params={"elasticNetParam" : 0.75})
      
      print("After:", lr.getOrDefault("elasticNetParam"))
      

      Attachments

        Issue Links

          Activity

            People

              JohnHBauer John Bauer
              JohnHBauer John Bauer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: