Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1554

IPA Scalar Transient Read Replacement

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • SystemML 0.15
    • None
    • None

    Description

      Currently, during IPA we collect all variables (scalars & matrices) eligible for propagation across blocks (i.e. not updated in block), and then propagate the only the matrix sizes across the blocks. It seems plausible that we could also replace all eligible scalar transient reads with literals based on the variables that have already been collected. The benefit is that many ops will be able to determine their respective output sizes during regular compilation, instead of having to wait until dynamic recompilation, and thus we can reduce the pressure on dynamic recompilation.

      Are there drawbacks to this approach? The use case is that I was seeing a large number of memory warnings while training a convolutional net due to the sizes being unknown during regular compilation, yet the engine only having CP versions of the ops. Additionally, I was running into actual heap space OOM errors for situations that should not run out of memory, and thus I started exploring.

      I've attached an example script and the explain plan (hops & runtime) w/ and w/o the IPA scalar replacement.

      Attachments

        1. convnet_distrib_sgd.dml
          29 kB
          Mike Dusenberry
        2. parfor_oom_convnet.py
          1 kB
          Mike Dusenberry
        3. parfor_oom_convnet_plan.txt
          6 kB
          Mike Dusenberry
        4. parfor_oom_plan.txt
          26 kB
          Mike Dusenberry
        5. parfor_oom.py
          2 kB
          Mike Dusenberry

        Issue Links

          Activity

            People

              dusenberrymw Mike Dusenberry
              dusenberrymw Mike Dusenberry
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: