Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1809

Optimize the performance of the distributed MNIST_LeNet_Sgd model training

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • SystemML 1.0.0
    • None
    • None
    • Sprint 3

    Description

      For the current version, the distributed MNIST_LeNet_Sdg model training can be optimized from the following aspects:

      1. Optimize the DML scripts with considering the backend engine, such as the intermediate matrixes are exported to HDFS, so we need to avoid unnecessary intermediate matrixes.
      2. Improve the efficiency of matrix subsetting
      3. Data locality: for RemoteParForSpark, the tasks are parallelized without considering data locality. It will cause a lot of data shuffling if the volume of the input data size is large;
      4. Result merge: the current experiments indicate that the result merge part took more time than model training.

      After the optimization, we need to compare the performance with the distributed Tensorflow.

      Attachments

        Issue Links

          Activity

            People

              Tenma Fei Hu
              Tenma Fei Hu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: