Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1321 Compiler feature extensions
  3. SYSTEMDS-1336

Improve parfor exec type selection (w/ potential data partitioning)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • SystemML 0.14
    • Compiler
    • None

    Description

      This task aims to address suboptimal parfor optimizer choices for partitionable scenarios with large driver memory. Currently, we only apply partitioning, if the right indexing operation does not fit in memory of the driver or remote tasks. The execution type selection is then unaware of potential partitioning, and does not revert this decision - this is problematic, because the large input likely exceeds the memory budget of remote tasks, ultimately causing the optimizer to fall back to a local parfor with very small degree of parallelism k.

      On our perftest 8GB Univariate stats scenario (with 20GB driver, i.e., 14GB memory budget), this lead to a local parfor with k=1 and thus, unnecessarily high execution time.

      Total elapsed time:		781.233 sec.
      Total compilation time:		2.059 sec.
      Total execution time:		779.175 sec.
      Number of compiled Spark inst:	0.
      Number of executed Spark inst:	0.
      Cache hits (Mem, WB, FS, HDFS):	27904/0/0/2.
      Cache writes (WB, FS, HDFS):	3134/0/1.
      Cache times (ACQr/m, RLS, EXP):	9.200/0.022/0.301/0.300 sec.
      HOP DAGs recompiled (PRED, SB):	0/100.
      HOP DAGs recompile time:	0.238 sec.
      Spark ctx create time (lazy):	0.000 sec.
      Spark trans counts (par,bc,col):0/0/0.
      Spark trans times (par,bc,col):	0.000/0.000/0.000 secs.
      ParFor loops optimized:		1.
      ParFor optimize time:		1.985 sec.
      ParFor initialize time:		0.007 sec.
      ParFor result merge time:	0.003 sec.
      ParFor total update in-place:	0/0/13900
      Total JIT compile time:		13.542 sec.
      Total JVM GC count:		29.
      Total JVM GC time:		3.49 sec.
      Heavy hitter instructions (name, time, count):
      -- 1) 	cm 	479.000 sec 	2700
      -- 2) 	qsort 	228.928 sec 	900
      -- 3) 	qpick 	20.598 sec 	1800
      -- 4) 	rangeReIndex 	16.051 sec 	2999
      -- 5) 	uamean 	12.867 sec 	900
      -- 6) 	uacmax 	9.870 sec 	1
      -- 7) 	ctable 	3.158 sec 	100
      -- 8) 	uamin 	2.589 sec 	1000
      -- 9) 	uamax 	2.560 sec 	1101
      -- 10) 	write 	0.300 sec 	1
      

      Attachments

        Activity

          People

            mboehm7 Matthias Boehm
            mboehm7 Matthias Boehm
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: