Description
Add feature importance to random forest models.
If people are interested in this feature I could implement it given a mentor (API decisions, etc). Please find a description of the feature below:
Decision trees intrinsically perform feature selection by selecting appropriate split points. This information can be used to assess the relative importance of a feature.
Relative feature importance gives valuable insight into a decision tree or tree ensemble and can even be used for feature selection.
More information on feature importance (via decrease in impurity) can be found in ESLII (10.13.1) or here [1].
R's randomForest package uses a different technique for assessing variable importance that is based on permutation tests.
All necessary information to create relative importance scores should be available in the tree representation (class Node; split, impurity gain, (weighted) nr of samples?).
[1] http://scikit-learn.org/stable/modules/ensemble.html#feature-importance-evaluation
Attachments
Issue Links
- is blocked by
-
SPARK-6885 Decision trees: predict class probabilities
- Resolved
- is related to
-
SPARK-7674 R-like stats for ML models
- Resolved
- relates to
-
SPARK-9904 User guide for ML tree algorithms
- Closed
- links to
1.
|
Expose featureImportances on org.apache.spark.mllib.tree.RandomForest | Resolved | Unassigned |