Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
In many machine learning algorithms, we have to treeAggregate large vectors/arrays due to the large number of features. Unfortunately, the treeAggregate operation of RDD will be low efficiency when the dimension of vectors/arrays is bigger than million. Because high dimension of vector/array always occupy more than 100MB Memory, transferring a 100MB element among executors is pretty low efficiency in Spark.