[SPARK-3272] Calculate prediction for nodes separately from calculating information gain for splits in decision tree - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.2
Fix Version/s: 1.2.0
Component/s: MLlib
Labels:
None

Target Version/s:

1.2.0

Description

In current implementation, prediction for a node is calculated along with calculation of information gain stats for each possible splits. The value to predict for a specific node is determined, no matter what the splits are.
To save computation, we can first calculate prediction first and then calculate information gain stats for each split.

This is also necessary if we want to support minimum instances per node parameters(SPARK-2207) because when all splits don't satisfy minimum instances requirement , we don't use information gain of any splits. There should be a way to get the prediction value.

Attachments

Issue Links

links to

[Github] Pull Request #2332 (chouqin)

Activity

People

Assignee:: Qiping Li

Reporter:: Qiping Li

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Aug/14 02:49

Updated:: 10/Sep/14 22:38

Resolved:: 10/Sep/14 22:38