Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33592

Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0, 3.1.0
    • Fix Version/s: None
    • Component/s: ML, PySpark
    • Labels:
      None

      Description

      Two typical cases to reproduce it:
      (1)

      tokenizer = Tokenizer(inputCol="text", outputCol="words")
      hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
      lr = LogisticRegression()
      pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
      
      paramGrid = ParamGridBuilder() \
          .addGrid(hashingTF.numFeatures, [10, 100]) \
          .addGrid(lr.maxIter, [100, 200]) \
          .build()
      tvs = TrainValidationSplit(estimator=pipeline,
                                 estimatorParamMaps=paramGrid,
                                 evaluator=MulticlassClassificationEvaluator())
      
      tvs.save(tvsPath)
      loadedTvs = TrainValidationSplit.load(tvsPath)
      
      

      Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params `hashingTF.numFeatures` and `lr.maxIter` are lost.

      (2)

      lr = LogisticRegression()
      ova = OneVsRest(classifier=lr)
      grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
      evaluator = MulticlassClassificationEvaluator()
      tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, evaluator=evaluator)
      
      tvs.save(tvsPath)
      loadedTvs = TrainValidationSplit.load(tvsPath)
      
      

      Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params`lr.maxIter` are lost.

      Both CrossValidator and TrainValidationSplit in Pyspark has this issue.

        Attachments

          Activity

            People

            • Assignee:
              weichenxu123 Weichen Xu
              Reporter:
              weichenxu123 Weichen Xu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: