Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
SystemML 0.11
-
None
Description
This JIRA issue tracks the comparison of the performance of the LeNet scripts with & without using SystemML-NN. The goal is that they should have equal performance in terms of both accuracy and time. Any difference will be indicate areas of engine improvement.
Scripts:
- mnist_lenet-train.dml - LeNet script that does use the SystemML-NN library.
- lenet-train.dml - LeNet script that does not use the SystemML-NN library.
Current Status - Forced Singlenode:
Equal performance when running the scripts in standalone mode with the -exec singlenode flag, 20GB of memory, and using data inputs in the SystemML binary format – see run.sh and perf.sh for information.
Results:
- Run #1:
Script Time (s) Accuracy mnist_lenet-train.dml 2987.400704441 99.32% lenet-train.dml 2816.369435579 99.28%
- Run #2:
Script Time (s) Accuracy mnist_lenet-train.dml 2847.790531812 99.16% lenet-train.dml 2950.520494210 99.18%
So, same accuracy, and same runtime in singlenode mode!
Current Status - Spark Local:
The two scripts now have the same performance in Spark local mode (non-singlenode), equivalent to the performance in forced singlenode mode due to the creation of only CP jobs!
—
To fully reproduce, I basically created a directory, placed the two attached bash scripts in it, grabbed a copy of the NN library and placed it into the directory, ran the examples/get_mnist_data.sh script from the library to get the data (placed into examples/data), then used the attached convert.dml to create binary copies of the data for both scripts, then ran run.sh. Also, I copied examples/data to the base directory as well. Adjust the EXEC and related variables in perf.sh to switch between standalone, Spark, memory sizes, explain, stats, etc.
Attachments
Attachments
Issue Links
- relates to
-
SYSTEMDS-618 Deep Learning DML Library
- In Progress