[SPARK-3160] Simplify DecisionTree data structure for training - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: MLlib
Labels:
None

Description

Improvement: code clarity

Currently, we maintain a tree structure, a flat array of nodes, and a parentImpurities array.

Proposed fix: Maintain everything within a growing tree structure.

This would let us eliminate the flat array of nodes, thus saving storage when we do not grow a full tree. It would also potentially make it easier to pass subtrees to compute nodes for local training.

Note:

This JIRA used to have this item as well: We could have a “LearningNode extends Node” setup where the LearningNode holds metadata for learning (such as impurities). The test-time model could be extracted from this training-time model, so that extra information (such as impurities) does not have to be kept after training.
However, this is really a separate issue, so I removed it.

Attachments

Issue Links

links to

[Github] Pull Request #2341 (jkbradley)

Activity

People

Assignee:: Joseph K. Bradley

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Aug/14 22:22

Updated:: 12/Sep/14 08:39

Resolved:: 12/Sep/14 08:39