Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3159

Check for reducible DecisionTree

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4.0
    • Component/s: MLlib
    • Labels:
      None

      Description

      Improvement: test-time computation

      Currently, pairs of leaf nodes with the same parent can both output the same prediction. This happens since the splitting criterion (e.g., Gini) is not the same as prediction accuracy/MSE; the splitting criterion can sometimes be improved even when both children would still output the same prediction (e.g., based on the majority label for classification).

      We could check the tree and reduce it if possible after training.

      Note: This happens with scikit-learn as well.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                asolimando Alessandro Solimando
                Reporter:
                josephkb Joseph K. Bradley
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: