Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
spark 1.6.1
centos
Description
I'm running the following simple dml code:
N = 30000
L = Rand(cols=N, max=1, min=-1, pdf="uniform", rows=N)
LLT = L %*% t(L)
fn = sum(LLT * LLT)
print(fn)
The line 3 is translated as tsmm RIGHT and it takes almost 30 min to run both in standalone and spark modes.
Here is the output from explain and stats:
> /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g --driver-memory=60g --executor-cores=4 --num-executors=20 sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats 17/01/05 18:11:10 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:11:10 17/01/05 18:11:10 INFO conf.DMLConfig: Using internal default configuration settings. If you wish to customize any settings, please supply a `SystemML-config.xml` file. 17/01/05 18:11:10 INFO api.DMLScript: EXPLAIN (RUNTIME): # Memory Budget local/remote = 41216MB/?MB/?MB/?MB # Degree of Parallelism (vcores) local/remote = 48/? PROGRAM ( size CP/SP = 10/0 ) --MAIN PROGRAM ----GENERIC (lines 8-24) [recompile=false] ------CP createvar _mVar1 scratch_space//_p22241_10.168.31.117//_t0/temp1 true MATRIX binaryblock 30000 30000 1000 1000 900000000 copy ------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 _mVar1.MATRIX.DOUBLE ------CP createvar _mVar2 scratch_space//_p22241_10.168.31.117//_t0/temp2 true MATRIX binaryblock 30000 30000 1000 1000 -1 copy ------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT 48 ------CP rmvar _mVar1 ------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48 ------CP rmvar _mVar2 ------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING ------CP rmvar _Var3 ------CP rmvar _Var4 17/01/05 18:11:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 6.00000141554168E12 17/01/05 18:40:38 INFO api.DMLScript: SystemML Statistics: Total elapsed time: 1768.639 sec. Total compilation time: 0.266 sec. Total execution time: 1768.373 sec. Number of compiled Spark inst: 0. Number of executed Spark inst: 0. Cache hits (Mem, WB, FS, HDFS): 2/0/0/0. Cache writes (WB, FS, HDFS): 2/1/0. Cache times (ACQr/m, RLS, EXP): 0.000/0.001/13.410/0.000 sec. HOP DAGs recompiled (PRED, SB): 0/0. HOP DAGs recompile time: 0.000 sec. Spark ctx create time (lazy): 0.000 sec. Spark trans counts (par,bc,col):0/0/0. Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. Total JIT compile time: 1.345 sec. Total JVM GC count: 3. Total JVM GC time: 4.428 sec. Heavy hitter instructions (name, time, count): -- 1) tsmm 1762.014 sec 1 -- 2) rand 5.210 sec 1 -- 3) uasqk+ 0.541 sec 1 -- 4) rmvar 0.002 sec 4 -- 5) print 0.002 sec 1 -- 6) createvar 0.001 sec 2 17/01/05 18:40:38 INFO api.DMLScript: END DML run 01/05/2017 18:40:38
If I calculate t(L) %*% L which is tsmm LEFT, then it takes only 5 min. Here is the output of explain and stats in this case:
> /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g --driver-memory=60g --executor-cores=4 --num-executors=20 sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats 17/01/05 18:03:13 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:03:13 17/01/05 18:03:13 INFO conf.DMLConfig: Using internal default configuration settings. If you wish to customize any settings, please supply a `SystemML-config.xml` file. 17/01/05 18:03:13 INFO api.DMLScript: EXPLAIN (RUNTIME): # Memory Budget local/remote = 41216MB/?MB/?MB/?MB # Degree of Parallelism (vcores) local/remote = 48/? PROGRAM ( size CP/SP = 10/0 ) --MAIN PROGRAM ----GENERIC (lines 8-24) [recompile=false] ------CP createvar _mVar1 scratch_space//_p19190_10.168.31.117//_t0/temp1 true MATRIX binaryblock 30000 30000 1000 1000 900000000 copy ------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 _mVar1.MATRIX.DOUBLE ------CP createvar _mVar2 scratch_space//_p19190_10.168.31.117//_t0/temp2 true MATRIX binaryblock 30000 30000 1000 1000 -1 copy ------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 48 ------CP rmvar _mVar1 ------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48 ------CP rmvar _mVar2 ------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING ------CP rmvar _Var3 ------CP rmvar _Var4 17/01/05 18:03:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable q5.999491296172791E12 17/01/05 18:09:14 INFO api.DMLScript: SystemML Statistics: Total elapsed time: 361.293 sec. Total compilation time: 0.267 sec. Total execution time: 361.026 sec. Number of compiled Spark inst: 0. Number of executed Spark inst: 0. Cache hits (Mem, WB, FS, HDFS): 2/0/0/0. Cache writes (WB, FS, HDFS): 2/1/0. Cache times (ACQr/m, RLS, EXP): 0.000/0.001/12.827/0.000 sec. HOP DAGs recompiled (PRED, SB): 0/0. HOP DAGs recompile time: 0.000 sec. Spark ctx create time (lazy): 0.000 sec. Spark trans counts (par,bc,col):0/0/0. Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. Total JIT compile time: 1.32 sec. Total JVM GC count: 3. Total JVM GC time: 4.271 sec. Heavy hitter instructions (name, time, count): -- 1) tsmm 354.384 sec 1 -- 2) rand 5.181 sec 1 -- 3) uasqk+ 0.842 sec 1 -- 4) rmvar 0.002 sec 4 -- 5) createvar 0.001 sec 2 -- 6) print 0.001 sec 1 17/01/05 18:09:14 INFO api.DMLScript: END DML run 01/05/2017 18:09:14
This time difference for the two cases is way too large. I'm sure there is some bug in runtime code that is causing this.