Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.4.1
-
None
-
None
Description
Impurity method 'variance' should only be used for regressors, not classifiers. For classifiers gini and entropy should be available as it is already the case for the RandomForestClassifier https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.ml.classification.RandomForestClassifier.html .
Because of this bug 'minInfoGain' hyperparameter cannot be tuned to combat overfitting.