Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
SystemML 1.0.0
-
None
-
None
-
Sprint 3
Description
For the current version, the distributed MNIST_LeNet_Sdg model training can be optimized from the following aspects:
- Optimize the DML scripts with considering the backend engine, such as the intermediate matrixes are exported to HDFS, so we need to avoid unnecessary intermediate matrixes.
- Improve the efficiency of matrix subsetting
- Data locality: for RemoteParForSpark, the tasks are parallelized without considering data locality. It will cause a lot of data shuffling if the volume of the input data size is large;
- Result merge: the current experiments indicate that the result merge part took more time than model training.
After the optimization, we need to compare the performance with the distributed Tensorflow.
Attachments
Issue Links
- is a parent of
-
SYSTEMDS-1821 Improve the training process in mnist_lenet_distrib_sgd.dml
- Open
-
SYSTEMDS-1830 Improve the data locality for the tasks in ParFor body
- Open
-
SYSTEMDS-1831 Improve the efficiency of matrix subsetting
- Open