Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39544

setPredictionCol for OneVsRest does not persist when saving model to disk

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0
    • None
    • ML
    • None
    • Python 3.6

      Spark 3.2

    Description

      The naming of rawPredcitionCol in OneVsRest does not persist after saving and loading a trained model. This becomes an issue when I try to stack multiple One Vs Rest models in a pipeline. Code example below. 

      from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel
      
      data_path = "/sample_multiclass_classification_data.txt"
      df = spark.read.format("libsvm").load(data_path)
      lr = LinearSVC(regParam=0.01)
      
      # set the name of rawPrediction column
      ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')
      print(ovr.getRawPredictionCol())
      
      model = ovr.fit(df)model_path = 'temp' + "/ovr_model"
      
      # save and read back in
      model.write().overwrite().save(model_path)
      model2 = OneVsRestModel.load(model_path)
      model2.getRawPredictionCol()
      
      Output:
      raw_prediction
      'rawPrediction' 

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            kobakhit koba

            Dates

              Created:
              Updated:

              Slack

                Issue deployment