Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3159

Check for reducible DecisionTree

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.4.0
    • MLlib
    • None

    Description

      Improvement: test-time computation

      Currently, pairs of leaf nodes with the same parent can both output the same prediction. This happens since the splitting criterion (e.g., Gini) is not the same as prediction accuracy/MSE; the splitting criterion can sometimes be improved even when both children would still output the same prediction (e.g., based on the majority label for classification).

      We could check the tree and reduce it if possible after training.

      Note: This happens with scikit-learn as well.

      Attachments

        Issue Links

          Activity

            People

              asolimando Alessandro Solimando
              josephkb Joseph K. Bradley
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: