[SYSTEMDS-1166] Bad performance of TSMM - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: SystemML 0.13
Component/s: Runtime
Labels:
None
Environment:
spark 1.6.1
centos

Description

I'm running the following simple dml code:

N = 30000
L = Rand(cols=N, max=1, min=-1, pdf="uniform", rows=N)
LLT = L %*% t(L)
fn = sum(LLT * LLT)
print(fn)

The line 3 is translated as tsmm RIGHT and it takes almost 30 min to run both in standalone and spark modes.

Here is the output from explain and stats:

> /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g --driver-memory=60g --executor-cores=4 --num-executors=20 sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats
17/01/05 18:11:10 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:11:10
17/01/05 18:11:10 INFO conf.DMLConfig: Using internal default configuration settings.  If you wish to customize any settings, please supply a `SystemML-config.xml` file.
17/01/05 18:11:10 INFO api.DMLScript: EXPLAIN (RUNTIME):
# Memory Budget local/remote = 41216MB/?MB/?MB/?MB
# Degree of Parallelism (vcores) local/remote = 48/?
PROGRAM ( size CP/SP = 10/0 )
--MAIN PROGRAM
----GENERIC (lines 8-24) [recompile=false]
------CP createvar _mVar1 scratch_space//_p22241_10.168.31.117//_t0/temp1 true MATRIX binaryblock 30000 30000 1000 1000 900000000 copy
------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 _mVar1.MATRIX.DOUBLE
------CP createvar _mVar2 scratch_space//_p22241_10.168.31.117//_t0/temp2 true MATRIX binaryblock 30000 30000 1000 1000 -1 copy
------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT 48
------CP rmvar _mVar1
------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48
------CP rmvar _mVar2
------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING
------CP rmvar _Var3
------CP rmvar _Var4

17/01/05 18:11:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
6.00000141554168E12
17/01/05 18:40:38 INFO api.DMLScript: SystemML Statistics:
Total elapsed time:    		      1768.639 sec.
Total compilation time:		      	       0.266 sec.
Total execution time:			       	     1768.373 sec.
Number of compiled Spark inst:			     0.
Number of executed Spark inst:			     0.
Cache hits (Mem, WB, FS, HDFS):			     2/0/0/0.
Cache writes (WB, FS, HDFS):			     2/1/0.
Cache times (ACQr/m, RLS, EXP):			     0.000/0.001/13.410/0.000 sec.
HOP DAGs recompiled (PRED, SB):			     0/0.
HOP DAGs recompile time:   0.000 sec.
Spark ctx create time (lazy):	 0.000 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col):	0.000/0.000/0.000 secs.
Total JIT compile time:				  1.345 sec.
Total JVM GC count:				  	3.
Total JVM GC time:						4.428 sec.
Heavy hitter instructions (name, time, count):
-- 1) 	     tsmm 	  1762.014 sec	1
-- 2) 	     rand 	  5.210 sec 	1
-- 3) 	     uasqk+ 	  0.541 sec 	1
-- 4) 	     rmvar 	  0.002 sec 	4
-- 5) 	     print 	  0.002 sec 	1
-- 6) 	     createvar 	  0.001 sec 	2

17/01/05 18:40:38 INFO api.DMLScript: END DML run 01/05/2017 18:40:38

If I calculate t(L) %*% L which is tsmm LEFT, then it takes only 5 min. Here is the output of explain and stats in this case:

> /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g --driver-memory=60g --executor-cores=4 --num-executors=20 sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats
17/01/05 18:03:13 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:03:13
17/01/05 18:03:13 INFO conf.DMLConfig: Using internal default configuration settings.  If you wish to customize any settings, please supply a `SystemML-config.xml` file.
17/01/05 18:03:13 INFO api.DMLScript: EXPLAIN (RUNTIME):
# Memory Budget local/remote = 41216MB/?MB/?MB/?MB
# Degree of Parallelism (vcores) local/remote = 48/?
PROGRAM ( size CP/SP = 10/0 )
--MAIN PROGRAM
----GENERIC (lines 8-24) [recompile=false]
------CP createvar _mVar1 scratch_space//_p19190_10.168.31.117//_t0/temp1 true MATRIX binaryblock 30000 30000 1000 1000 900000000 copy
------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 _mVar1.MATRIX.DOUBLE
------CP createvar _mVar2 scratch_space//_p19190_10.168.31.117//_t0/temp2 true MATRIX binaryblock 30000 30000 1000 1000 -1 copy
------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 48
------CP rmvar _mVar1
------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48
------CP rmvar _mVar2
------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING
------CP rmvar _Var3
------CP rmvar _Var4

17/01/05 18:03:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
q5.999491296172791E12
17/01/05 18:09:14 INFO api.DMLScript: SystemML Statistics:
Total elapsed time:    		      361.293 sec.
Total compilation time:		      	      0.267 sec.
Total execution time:			      	    361.026 sec.
Number of compiled Spark inst:			    0.
Number of executed Spark inst:			    0.
Cache hits (Mem, WB, FS, HDFS):			    2/0/0/0.
Cache writes (WB, FS, HDFS):			    2/1/0.
Cache times (ACQr/m, RLS, EXP):			    0.000/0.001/12.827/0.000 sec.
HOP DAGs recompiled (PRED, SB):			    0/0.
HOP DAGs recompile time:   0.000 sec.
Spark ctx create time (lazy):	 0.000 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col):	0.000/0.000/0.000 secs.
Total JIT compile time:				  1.32 sec.
Total JVM GC count:				       3.
Total JVM GC time:					4.271 sec.
Heavy hitter instructions (name, time, count):
-- 1) 	     tsmm 	  354.384 sec  1
-- 2) 	     rand 	  5.181 sec    1
-- 3) 	     uasqk+ 	  0.842 sec    1
-- 4) 	     rmvar 	  0.002 sec    4
-- 5) 	     createvar 	  0.001 sec    2
-- 6) 	     print 	  0.001 sec    1

17/01/05 18:09:14 INFO api.DMLScript: END DML run 01/05/2017 18:09:14

This time difference for the two cases is way too large. I'm sure there is some bug in runtime code that is causing this.

Bad performance of TSMM

Details

Description

Attachments

Activity

People

Dates