Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0
-
None
-
None
-
Python 3.6
Spark 3.2
Description
The naming of rawPredcitionCol in OneVsRest does not persist after saving and loading a trained model. This becomes an issue when I try to stack multiple One Vs Rest models in a pipeline. Code example below.
from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel data_path = "/sample_multiclass_classification_data.txt" df = spark.read.format("libsvm").load(data_path) lr = LinearSVC(regParam=0.01) # set the name of rawPrediction column ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction') print(ovr.getRawPredictionCol()) model = ovr.fit(df)model_path = 'temp' + "/ovr_model" # save and read back in model.write().overwrite().save(model_path) model2 = OneVsRestModel.load(model_path) model2.getRawPredictionCol() Output: raw_prediction 'rawPrediction'