Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
The current implementations of machine learning algorithms rely on the driver for some computation and data broadcasting. This will create a bottleneck at the driver for both computation and communication, especially in multi-model training. An efficient implementation of AllReduce (or AllAggregate) can help free the driver:
allReduce(RDD[T], (T, T) => T): RDD[T]
This JIRA is created for discussing how to implement AllReduce efficiently and possible alternatives.
Attachments
Issue Links
- is related to
-
SPARK-24374 SPIP: Support Barrier Execution Mode in Apache Spark
- Resolved
-
SPARK-2174 Implement treeReduce and treeAggregate
- Resolved
- links to