Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23409

RandomForest/DecisionTree (syntactic) pruning of redundant subtrees

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 2.2.1
    • None
    • MLlib
    • None

    Description

      Improvement: redundancy elimination from decision trees where all the leaves of a given subtree share the same prediction.

      Benefits:

      • Model interpretability
      • Faster unitary model invocation (relevant for massive number of invocations)
      • Smaller model memory footprint

      For instance, consider the following decision tree.

      Original Decision Tree
      DecisionTreeClassificationModel (uid=dtc_e794a5a3aa9e) of depth 3 with 15 nodes
        If (feature 1 <= 0.5)
         If (feature 2 <= 0.5)
          If (feature 0 <= 0.5)
           Predict: 0.0
          Else (feature 0 > 0.5)
           Predict: 0.0
         Else (feature 2 > 0.5)
          If (feature 0 <= 0.5)
           Predict: 0.0
          Else (feature 0 > 0.5)
           Predict: 0.0
        Else (feature 1 > 0.5)
         If (feature 2 <= 0.5)
          If (feature 0 <= 0.5)
           Predict: 1.0
          Else (feature 0 > 0.5)
           Predict: 1.0
         Else (feature 2 > 0.5)
          If (feature 0 <= 0.5)
           Predict: 0.0
          Else (feature 0 > 0.5)
           Predict: 0.0
      

      The proposed method, taken as input the first tree, aims at producing as output the following (semantically equivalent) tree:

      Pruned Decision Tree
      DecisionTreeClassificationModel (uid=dtc_e794a5a3aa9e) of depth 3 with 15 nodes
        If (feature 1 <= 0.5)
         Predict: 0.0
        Else (feature 1 > 0.5)
         If (feature 2 <= 0.5)
          Predict: 1.0
         Else (feature 2 > 0.5)
          Predict: 0.0
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              asolimando Alessandro Solimando
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: