Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1308 Runtime feature extensions
  3. SYSTEMDS-1350

Performance parfor spark datapartition-execute

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: SystemML 0.14
    • Component/s: APIs, Runtime
    • Labels:
      None

      Description

      Our fused parfor spark datapartition-execute job - as used for large scenarios of univariate statistics - exhibits some unnecessary runtime overheads. In detail, the potential improvements includes:

      1) Incremental nnz maintenance on partition collect
      2) Reuse of dense partitions per task (avoid reallocation)
      3) Explicitly control the number of output partitions (avoid OOMs, reduce memory pressure)
      4) Avoid unnecessary rdd export on parfor data partitioning

      The points (3) and (4) also apply to the parfor spark datapartition job.

        Attachments

          Activity

            People

            • Assignee:
              mboehm7 Matthias Boehm
              Reporter:
              mboehm7 Matthias Boehm
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: