Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
SystemML 1.0.0
-
None
-
None
-
Sprint 5
Description
For RemoteParForSpark, the tasks are parallelized without considering the data locality of the input matrixes. It will cause a lot of data shuffling if the volume of the input data size is large.
We can predict the data location of the input matrixes, and add these location information when parallelizing the ParFor program body.
Attachments
Issue Links
- is a child of
-
SYSTEMDS-1809 Optimize the performance of the distributed MNIST_LeNet_Sgd model training
- Open