[SPARK-23409] RandomForest/DecisionTree (syntactic) pruning of redundant subtrees - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: 2.2.1
Fix Version/s: None
Component/s: MLlib
Labels:
None

Description

Improvement: redundancy elimination from decision trees where all the leaves of a given subtree share the same prediction.

Benefits:

Model interpretability
Faster unitary model invocation (relevant for massive number of invocations)
Smaller model memory footprint

For instance, consider the following decision tree.

Original Decision Tree

DecisionTreeClassificationModel (uid=dtc_e794a5a3aa9e) of depth 3 with 15 nodes
  If (feature 1 <= 0.5)
   If (feature 2 <= 0.5)
    If (feature 0 <= 0.5)
     Predict: 0.0
    Else (feature 0 > 0.5)
     Predict: 0.0
   Else (feature 2 > 0.5)
    If (feature 0 <= 0.5)
     Predict: 0.0
    Else (feature 0 > 0.5)
     Predict: 0.0
  Else (feature 1 > 0.5)
   If (feature 2 <= 0.5)
    If (feature 0 <= 0.5)
     Predict: 1.0
    Else (feature 0 > 0.5)
     Predict: 1.0
   Else (feature 2 > 0.5)
    If (feature 0 <= 0.5)
     Predict: 0.0
    Else (feature 0 > 0.5)
     Predict: 0.0

The proposed method, taken as input the first tree, aims at producing as output the following (semantically equivalent) tree:

Pruned Decision Tree

DecisionTreeClassificationModel (uid=dtc_e794a5a3aa9e) of depth 3 with 15 nodes
  If (feature 1 <= 0.5)
   Predict: 0.0
  Else (feature 1 > 0.5)
   If (feature 2 <= 0.5)
    Predict: 1.0
   Else (feature 2 > 0.5)
    Predict: 0.0

Attachments

Issue Links

duplicates

SPARK-3159 Check for reducible DecisionTree

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Alessandro Solimando

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Feb/18 13:36

Updated:: 16/Feb/18 21:30

Resolved:: 16/Feb/18 21:30