[SPARK-14408] Update RDD.treeAggregate not to use reduce - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.3.0
Component/s: ML, MLlib, Spark Core
Labels:
None

Description

*Issue*
In MLlib, we have assumed that RDD.treeAggregate allows the seqOp and combOp functions to modify and return their first argument, just like RDD.aggregate. However, it is not documented that way.

I started to add docs to this effect, but then noticed that treeAggregate uses reduceByKey and reduce in its implementation, neither of which technically allows the seq/combOps to modify and return their first arguments.

*Question*: Is the implementation safe, or does it need to be updated?

*Decision*: Avoid using reduce. Use fold instead.

Attachments

Issue Links

links to

[Github] Pull Request #12217 (jkbradley)

[Github] Pull Request #18198 (HyukjinKwon)

Activity

People

Assignee:: Joseph K. Bradley

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 05/Apr/16 18:03

Updated:: 09/Jun/17 07:53

Resolved:: 09/Jun/17 07:53