XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • SystemML 1.1
    • None
    • None

    Description

      While codegen worked extremely well for KMeans with 1 run, we currently see performance issues in a parfor setting with concurrent 10 runs, which all spawn distributed spark operations. In detail, this is due to particular plan choices that are affected by the reduced local memory budget per parfor worker. However, these issues can be overcome by avoiding unnecessary RDD joins in distributed codegen operations via better broadcast handling (currently the first input is always assumed to be an RDD).

      Total elapsed time:		9305.981 sec.
      Total compilation time:		3.023 sec.
      Total execution time:		9302.958 sec.
      Number of compiled Spark inst:	21.
      Number of executed Spark inst:	193.
      Cache hits (Mem, WB, FS, HDFS):	1242/0/0/91.
      Cache writes (WB, FS, HDFS):	456/188/1.
      Cache times (ACQr/m, RLS, EXP):	10086.631/0.011/114.967/1.291 sec.
      HOP DAGs recompiled (PRED, SB):	0/108.
      HOP DAGs recompile time:	2.733 sec.
      Functions recompiled:		1.
      Functions recompile time:	0.043 sec.
      Codegen compile (DAG,CP,JC):	176/430/21.
      Codegen enum (ALLt/p,EVALt/p):	48076/47974/39249/38324.
      Codegen compile times (DAG,JC):	3.024/0.491 sec.
      Codegen enum plan cache hits:	0/0.
      Codegen op plan cache hits:	395/416.
      Spark ctx create time (lazy):	19.506 sec.
      Spark trans counts (par,bc,col):0/179/91.
      Spark trans times (par,bc,col):	0.000/1.954/10086.614 secs.
      ParFor loops optimized:		1.
      ParFor optimize time:		0.141 sec.
      ParFor initialize time:		0.022 sec.
      ParFor result merge time:	0.059 sec.
      ParFor total update in-place:	0/40/50
      Total JIT compile time:		98.963 sec.
      Total JVM GC count:		374.
      Total JVM GC time:		72.456 sec.
      Heavy hitter instructions:
        #  Instruction          Time(s)  Count
        1  sp_spoofRATMP63   73,750.553     89
        2  spoofRATMP43      10,195.724     89
        3  sp_chkpoint           20.239     12
        4  sp_uasqk+             14.347      1
        5  spoofRATMP52          10.496     89
        6  ba+*                   9.273     15
        7  sp_mapmm               1.543      1
        8  write                  1.291      1
        9  /                      1.127     92
       10  sp_spoofRATMP116       0.930     89
      

      An initial prototype to avoid unnecessary shuffle improved performance from 9305 to 1607s, but additional improvements are possible.

      Attachments

        Activity

          People

            mboehm7 Matthias Boehm
            mboehm7 Matthias Boehm
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: