[SPARK-3162] Train DecisionTree locally when possible - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Improvement: communication

Currently, every level of a DecisionTree is trained in a distributed manner. However, at deeper levels in the tree, it is possible that a small set of training data will be matched with any given node. If the node’s training data can fit on one machine’s memory, it may be more efficient to shuffle the data and do local training for the rest of the subtree rooted at that node.

Note: It is possible that local training would become possible at different levels in different branches of the tree. There are multiple options for handling this case:
(1) Train in a distributed fashion until all remaining nodes can be trained locally. This would entail training multiple levels at once (locally).
(2) Train branches locally when possible, and interleave this with distributed training of the other branches.

Attachments

Issue Links

Is contained by

SPARK-14045 DecisionTree improvement umbrella

Resolved

is related to

SPARK-14043 Remove restriction on maxDepth for decision trees

Resolved

links to

[Github] Pull Request #14872 (smurching)

[Github] Pull Request #19433 (smurching)

[Github] Pull Request #19758 (smurching)

GitHub Pull Request #19433

(1 links to)

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 20/Aug/14 22:23

Updated:: 25/May/21 01:54

Resolved:: 25/May/21 01:41