Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
*Issue*
In MLlib, we have assumed that RDD.treeAggregate allows the seqOp and combOp functions to modify and return their first argument, just like RDD.aggregate. However, it is not documented that way.
I started to add docs to this effect, but then noticed that treeAggregate uses reduceByKey and reduce in its implementation, neither of which technically allows the seq/combOps to modify and return their first arguments.
*Question*: Is the implementation safe, or does it need to be updated?
*Decision*: Avoid using reduce. Use fold instead.