[SPARK-3157] Avoid duplicated stats in DecisionTree extractLeftRightNodeAggregates - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: MLlib
Labels:
None

Description

Improvement: computation, memory usage

For ordered features, extractLeftRightNodeAggregates() computes pairs of cumulative sums. However, these sums are redundant since they are simply cumulative sums accumulating from the left and right ends, respectively. Only compute one sum.
For unordered features, the left and right aggregates are essentially the same data, copied from the original aggregates, but shifted by one index. Avoid copying data.

Attachments

Issue Links

Is contained by

SPARK-3043 DecisionTree aggregation is inefficient

Resolved

Activity

People

Assignee:: Joseph K. Bradley

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Aug/14 22:19

Updated:: 21/Jul/15 17:43

Resolved:: 21/Jul/15 17:43