Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33592

Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.0.0, 3.1.0
    • None
    • ML, PySpark
    • None

    Description

      Two typical cases to reproduce it:
      (1)

      tokenizer = Tokenizer(inputCol="text", outputCol="words")
      hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
      lr = LogisticRegression()
      pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
      
      paramGrid = ParamGridBuilder() \
          .addGrid(hashingTF.numFeatures, [10, 100]) \
          .addGrid(lr.maxIter, [100, 200]) \
          .build()
      tvs = TrainValidationSplit(estimator=pipeline,
                                 estimatorParamMaps=paramGrid,
                                 evaluator=MulticlassClassificationEvaluator())
      
      tvs.save(tvsPath)
      loadedTvs = TrainValidationSplit.load(tvsPath)
      
      

      Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params `hashingTF.numFeatures` and `lr.maxIter` are lost.

      (2)

      lr = LogisticRegression()
      ova = OneVsRest(classifier=lr)
      grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
      evaluator = MulticlassClassificationEvaluator()
      tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, evaluator=evaluator)
      
      tvs.save(tvsPath)
      loadedTvs = TrainValidationSplit.load(tvsPath)
      
      

      Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params`lr.maxIter` are lost.

      Both CrossValidator and TrainValidationSplit in Pyspark has this issue.

      Attachments

        Activity

          People

            weichenxu123 Weichen Xu
            weichenxu123 Weichen Xu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: