[MADLIB-925] Improve RF output format for variable importance (and new DT/RF impurity importance) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: v1.15
Component/s: Module: Random Forest
Labels:
- starter

Description

As a user,
I want to have an easier way of accessing the variable importance output from random forest so that I can understand which are the most important variables.

Current method of getting variable importance for each variable (in a tabular format - assuming output table name is `rf_output`):
```
SELECT unnest(regexp_split_to_array(cat_features, ',')) as variable,
unnest(cat_var_importance) as importance
FROM rf_output_group, rf_output_summary;
```

This is a cumbersome query to write and has to be written twice - for categorical and for continuous features.

Attachments

Issue Links

links to

GitHub Pull Request #295

mentioned in: Page Loading...

Activity

People

Assignee:: Nandish Jayaram

Reporter:: Frank McQuillan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Nov/15 21:26

Updated:: 01/Aug/18 22:31

Resolved:: 01/Aug/18 20:06