Description
The helper classes for decision trees and decision tree ensembles (e.g. Impurity, InformationGainStats, ImpurityStats, DTStatsAggregator, etc...) currently reside in spark.mllib, but as the algorithm implementations are moved to spark.ml, so should these helper classes.
We should take this opportunity to make some of those helper classes private when possible (especially if they are only needed during training) and maybe change the APIs (especially if we can eliminate duplicate data stored in the final model).
Attachments
Issue Links
- duplicates
-
SPARK-16728 migrate internal API for MLlib trees from spark.mllib to spark.ml
- Resolved