Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3272

Calculate prediction for nodes separately from calculating information gain for splits in decision tree

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.2
    • 1.2.0
    • MLlib
    • None

    Description

      In current implementation, prediction for a node is calculated along with calculation of information gain stats for each possible splits. The value to predict for a specific node is determined, no matter what the splits are.
      To save computation, we can first calculate prediction first and then calculate information gain stats for each split.

      This is also necessary if we want to support minimum instances per node parameters(SPARK-2207) because when all splits don't satisfy minimum instances requirement , we don't use information gain of any splits. There should be a way to get the prediction value.

      Attachments

        Activity

          People

            chouqin Qiping Li
            chouqin Qiping Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: