Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-2989

Thread contention in parfor eval function calls

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • SystemDS 2.1
    • None
    • None

    Description

      In parfor, individual functions are deep copied per worker in order to prevent thread contention on shared HOP DAGs during recompilation. However, if functions are called through eval (unknown functions), this setup is not done. For that reason, gridSearch with functions that do not benefit from recompile-once function compilation (e.g., MLogreg) show large overhead.

      On a scenario of gridSearch with a single node (112 vcores), the Adult dataset, 3003 hyper-param configurations and MLogreg, the statistics output is as follows (notice the recompilation stats)

      Total elapsed time:             1911.113 sec.
      Total compilation time:         0.901 sec.
      Total execution time:           1910.212 sec.
      Cache hits (Mem/Li/WB/FS/HDFS): 141741896/0/0/0/1.
      Cache writes (Li/WB/FS/HDFS):   0/3006/0/2.
      Cache times (ACQr/m, RLS, EXP): 32.152/10.918/37.762/0.083 sec.
      HOP DAGs recompiled (PRED, SB): 0/21456699.
      HOP DAGs recompile time:        147624.831 sec.
      Functions recompiled:           3005.
      Functions recompile time:       6767.515 sec.
      ParFor loops optimized:         2.
      ParFor optimize time:           0.062 sec.
      ParFor initialize time:         0.046 sec.
      ParFor result merge time:       0.215 sec.
      ParFor total update in-place:   0/15021/609900
      Total JIT compile time:         39.483 sec.
      Total JVM GC count:             37.
      Total JVM GC time:              2.724 sec.
      Heavy hitter instructions:
        #  Instruction       Time(s)      Count
        1  eval          173,472.884       6006
        2  mmchain        14,039.611    5007211
        3  m_gridSearch    1,907.351          1
        4  sprop           1,683.663    5007211
        5  rightIndex        934.766    6511372
        6  ba+*              730.358    6415971
        7  exp               206.746     136373
        8  rmvar             161.539  112050148
        9  log               145.937     136373
       10  append            118.688     141379
      

      Attachments

        Activity

          People

            mboehm7 Matthias Boehm
            mboehm7 Matthias Boehm
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: