[SYSTEMDS-1011] Slow sparse append cbind (sparse row re-allocations) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: SystemML 0.11
Component/s: None
Labels:
None

Description

All algorithms that support the 'intercept' option (e.g., LinregCG, LinregDS, L2SVM, MSVM, Mlogreg, and GLM) append a column of 1s in the beginning of the script. On large sparse data, this append sometimes dominates end-to-end performance. For example, here are the LinregCG results for a 10Mx1K scenario with sparsity 0.01.

-- Running runLinearRegCG on 10M_1k_sparse (all configs)
LinRegCG train ict=0 on mbperftest/binomial/X10M_1k_sparse: 7
LinRegCG train ict=1 on mbperftest/binomial/X10M_1k_sparse: 15
LinRegCG train ict=2 on mbperftest/binomial/X10M_1k_sparse: 15
-- Running runLinearRegCG on 10M_1k_sparse (all configs)
LinRegCG train ict=0 on mbperftest/binomial/X10M_1k_sparse: 7
LinRegCG train ict=1 on mbperftest/binomial/X10M_1k_sparse: 15
LinRegCG train ict=2 on mbperftest/binomial/X10M_1k_sparse: 15
-- Running runLinearRegCG on 10M_1k_sparse (all configs)
LinRegCG train ict=0 on mbperftest/binomial/X10M_1k_sparse: 6
LinRegCG train ict=1 on mbperftest/binomial/X10M_1k_sparse: 15
LinRegCG train ict=2 on mbperftest/binomial/X10M_1k_sparse: 16
-- Running runLinearRegCG on 10M_1k_sparse (all configs)
LinRegCG train ict=0 on mbperftest/binomial/X10M_1k_sparse: 7
LinRegCG train ict=1 on mbperftest/binomial/X10M_1k_sparse: 16
LinRegCG train ict=2 on mbperftest/binomial/X10M_1k_sparse: 15

and here is the related -stats output for ict=1.

Total elapsed time:		16.893 sec.
Total compilation time:		2.412 sec.
Total execution time:		14.480 sec.
Number of compiled Spark inst:	0.
Number of executed Spark inst:	0.
Cache hits (Mem, WB, FS, HDFS):	172/0/0/2.
Cache writes (WB, FS, HDFS):	77/0/1.
Cache times (ACQr/m, RLS, EXP):	1.734/0.003/2.143/0.209 sec.
HOP DAGs recompiled (PRED, SB):	0/0.
HOP DAGs recompile time:	0.000 sec.
Spark ctx create time (lazy):	0.000 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col):	0.000/0.000/0.000 secs.
Total JIT compile time:		5.357 sec.
Total JVM GC count:		2.
Total JVM GC time:		5.628 sec.
Heavy hitter instructions (name, time, count):
-- 1) 	append 	8.595 sec 	26
-- 2) 	mmchain 	4.443 sec 	8
-- 3) 	ba+* 	0.537 sec 	10
-- 4) 	r' 	0.411 sec 	10
-- 5) 	write 	0.210 sec 	1
-- 6) 	- 	0.087 sec 	20
-- 7) 	uak+ 	0.059 sec 	2
-- 8) 	tsmm 	0.049 sec 	11
-- 9) 	rand 	0.043 sec 	5
-- 10) 	+* 	0.007 sec 	24

The large GC time indicates that sparse row re-allocations are a major issue here. We should compute the joint nnz per output row, and allocate the output sparse row just once.

Attachments

Issue Links

Is contained by

SYSTEMDS-1010 Perftest 0.11 release and related improvements

Resolved

is related to

SYSTEMDS-1012 Multi-threaded append operations

Open

Activity

People

Assignee:: Matthias Boehm

Reporter:: Matthias Boehm

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Oct/16 02:03

Updated:: 05/Oct/16 17:51

Resolved:: 05/Oct/16 17:51