Description
RandomForest currently supports using a few values for the number of features to sample per node: all, sqrt, log2, etc. It should support any given value (to allow model search).
Proposal: If the parameter for specifying the number of features per node is not recognized (as “all”, “sqrt”, etc.), then it will be parsed as a numerical value. The value should be either (a) a real value in [0,1] specifying the fraction of features in each subset or (b) an integer value specifying the number of features in each subset.
Attachments
Issue Links
- Is contained by
-
SPARK-14046 RandomForest improvement umbrella
- Resolved
- is related to
-
SPARK-14565 RandomForest should use parseInt and parseDouble for feature subset size instead of regexes
- Resolved
- links to