Details
-
Umbrella
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
This is an umbrella for improvements to decision tree learning. This includes:
- DecisionTreeClassifier
- DecisionTreeRegressor
- aspects of tree ensembles specific to learning individual trees, i.e., issues which will also affect DecisionTreeClassifier/Regressor
Attachments
Issue Links
- contains
-
SPARK-34591 Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to diagnose and impossible to correct
- In Progress
-
SPARK-19591 Add sample weights to decision trees
- Resolved
-
SPARK-15699 Add chi-squared test statistic as a split quality metric for decision trees
- Resolved
-
SPARK-3162 Train DecisionTree locally when possible
- Resolved
-
SPARK-3717 DecisionTree, RandomForest: Partition by feature
- Resolved
-
SPARK-12301 Remove final from classes in spark.ml trees and ensembles where possible
- Resolved
-
SPARK-14351 Optimize ImpurityAggregator for decision trees
- Resolved
-
SPARK-22451 Reduce decision tree aggregate size for unordered features from O(2^numCategories) to O(numCategories)
- Resolved
-
SPARK-3155 Support DecisionTree pruning
- Resolved
-
SPARK-3159 Check for reducible DecisionTree
- Resolved
-
SPARK-3165 DecisionTree does not use sparsity in data
- Resolved
-
SPARK-3380 DecisionTree: overflow and precision in aggregation
- Resolved
-
SPARK-3383 DecisionTree aggregate size could be smaller
- Resolved
-
SPARK-3723 DecisionTree, RandomForest: Add more instrumentation
- Resolved
-
SPARK-14043 Remove restriction on maxDepth for decision trees
- Resolved
-
SPARK-16957 Use weighted midpoints for split values.
- Resolved
-
SPARK-14216 ML tree models should have a standardized, reusable feature importance test
- Resolved
- is related to
-
SPARK-14046 RandomForest improvement umbrella
- Resolved
-
SPARK-14047 GBT improvement umbrella
- Resolved