Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-2478

Overhead when using parfor in update func

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      When using parfor inside update function, some MR tasks are launched to write the output of task. And it took more time to finish the paramserv run than without parfor in update function. The scenario is to launch the ASP Epoch DC spark paramserv test.
      Here is the stack:

      Total elapsed time:		101.804 sec.
      Total compilation time:		3.690 sec.
      Total execution time:		98.114 sec.
      Number of compiled Spark inst:	302.
      Number of executed Spark inst:	540.
      Cache hits (Mem, WB, FS, HDFS):	57839/0/0/240.
      Cache writes (WB, FS, HDFS):	14567/58/61.
      Cache times (ACQr/m, RLS, EXP):	42.346/0.064/4.761/20.280 sec.
      HOP DAGs recompiled (PRED, SB):	0/144.
      HOP DAGs recompile time:	0.507 sec.
      Functions recompiled:		16.
      Functions recompile time:	0.064 sec.
      Spark ctx create time (lazy):	1.376 sec.
      Spark trans counts (par,bc,col):270/1/240.
      Spark trans times (par,bc,col):	0.573/0.197/42.255 secs.
      Paramserv total num workers:	3.
      Paramserv setup time:		1.559 secs.
      Paramserv grad compute time:	105.701 secs.
      Paramserv model update time:	56.801/47.193 secs.
      Paramserv model broadcast time:	23.872 secs.
      Paramserv batch slice time:	0.000 secs.
      Paramserv RPC request time:	105.159 secs.
      ParFor loops optimized:		1.
      ParFor optimize time:		0.040 sec.
      ParFor initialize time:		0.434 sec.
      ParFor result merge time:	0.005 sec.
      ParFor total update in-place:	0/7/7
      Total JIT compile time:		68.384 sec.
      Total JVM GC count:		1120.
      Total JVM GC time:		22.338 sec.
      Heavy hitter instructions:
        #  Instruction             Time(s)  Count
        1  paramserv                97.221      1
        2  conv2d_bias_add          60.581    614
        3  *                        54.990  12447
        4  sp_-                     20.625    240
        5  -                        17.979   7287
        6  +                        14.191  12824
        7  r'                        5.636   1200
        8  conv2d_backward_filter    5.123    600
        9  max                       4.985    907
       10  ba+*                      4.591   1814
      
      

      Here is the polished update func:

      aggregation = function(list[unknown] model,
                             list[unknown] gradients,
                             list[unknown] hyperparams)
         return (list[unknown] modelResult) {
           lr = as.double(as.scalar(hyperparams["lr"]))
           mu = as.double(as.scalar(hyperparams["mu"]))
      
           modelResult = model
      
           # Optimize with SGD w/ Nesterov momentum
           parfor(i in 1:8, check=0) {
             P = as.matrix(model[i])
             dP = as.matrix(gradients[i])
             vP = as.matrix(model[8+i])
             [P, vP] = sgd_nesterov::update(P, dP, lr, mu, vP)
             modelResult[i] = P
             modelResult[8+i] = vP
           }
         }
      

      mboehm7, in fact, I have no idea where the cause comes from? It seems that it tried to write the parfor task output into HDFS. So is it the normal behavior?

      Attachments

        Activity

          People

            Unassigned Unassigned
            Guobao LI Guobao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: