Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-925

Improve RF output format for variable importance (and new DT/RF impurity importance)

    XMLWordPrintableJSON

Details

    Description

      As a user,
      I want to have an easier way of accessing the variable importance output from random forest so that I can understand which are the most important variables.

      Current method of getting variable importance for each variable (in a tabular format - assuming output table name is `rf_output`):
      ```
      SELECT unnest(regexp_split_to_array(cat_features, ',')) as variable,
      unnest(cat_var_importance) as importance
      FROM rf_output_group, rf_output_summary;
      ```

      This is a cumbersome query to write and has to be written twice - for categorical and for continuous features.

      Attachments

        Issue Links

          Activity

            People

              njayaram Nandish Jayaram
              fmcquillan Frank McQuillan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: