Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3160

Simplify DecisionTree data structure for training

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.2.0
    • MLlib
    • None

    Description

      Improvement: code clarity

      Currently, we maintain a tree structure, a flat array of nodes, and a parentImpurities array.

      Proposed fix: Maintain everything within a growing tree structure.

      This would let us eliminate the flat array of nodes, thus saving storage when we do not grow a full tree. It would also potentially make it easier to pass subtrees to compute nodes for local training.

      Note:

      • This JIRA used to have this item as well: We could have a “LearningNode extends Node” setup where the LearningNode holds metadata for learning (such as impurities). The test-time model could be extracted from this training-time model, so that extra information (such as impurities) does not have to be kept after training.
      • However, this is really a separate issue, so I removed it.

      Attachments

        Activity

          People

            josephkb Joseph K. Bradley
            josephkb Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: