Add feature importance to random forest models.
If people are interested in this feature I could implement it given a mentor (API decisions, etc). Please find a description of the feature below:
Decision trees intrinsically perform feature selection by selecting appropriate split points. This information can be used to assess the relative importance of a feature.
Relative feature importance gives valuable insight into a decision tree or tree ensemble and can even be used for feature selection.
More information on feature importance (via decrease in impurity) can be found in ESLII (10.13.1) or here .
R's randomForest package uses a different technique for assessing variable importance that is based on permutation tests.
All necessary information to create relative importance scores should be available in the tree representation (class Node; split, impurity gain, (weighted) nr of samples?).
|Expose featureImportances on org.apache.spark.mllib.tree.RandomForest||Resolved||Unassigned|