[SYSTEMDS-1809] Optimize the performance of the distributed MNIST_LeNet_Sgd model training - ASF JIRA

XML

Word

Printable

JSON

For the current version, the distributed MNIST_LeNet_Sdg model training can be optimized from the following aspects:

Optimize the DML scripts with considering the backend engine, such as the intermediate matrixes are exported to HDFS, so we need to avoid unnecessary intermediate matrixes.
Improve the efficiency of matrix subsetting
Data locality: for RemoteParForSpark, the tasks are parallelized without considering data locality. It will cause a lot of data shuffling if the volume of the input data size is large;
Result merge: the current experiments indicate that the result merge part took more time than model training.

After the optimization, we need to compare the performance with the distributed Tensorflow.

is a parent of

SYSTEMDS-1821 Improve the training process in mnist_lenet_distrib_sgd.dml

SYSTEMDS-1830 Improve the data locality for the tasks in ParFor body

SYSTEMDS-1831 Improve the efficiency of matrix subsetting