Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46383

Reduce Driver Heap Usage by Reducing the Lifespan of `TaskInfo.accumulables()`

    XMLWordPrintableJSON

Details

    Description

      `AccumulableInfo` is one of the top heap consumers in driver's heap dumps for stages with many tasks. For a stage with a large number of tasks (O(100k)), we saw {}30%{} of the heap usage stemming from `TaskInfo.accumulables()`.

       

      The `TaskSetManager` today keeps around the TaskInfo objects (ref1, ref2)) and in turn the task metrics (`AccumulableInfo`) for every task attempt until the stage is completed. This means that for stages with a large number of tasks, we keep metrics for all the tasks (`AccumulableInfo`) around even when the task has completed and its metrics have been aggregated. Given a task has a large number of metrics, stages with many tasks end up with a large heap usage in the form of task metrics.
      Ideally, we should clear up a task's TaskInfo upon the task's completion, thereby reducing the driver's heap usage.

      Attachments

        1. screenshot-1.png
          225 kB
          Utkarsh Agarwal
        2. Screenshot 2023-11-06 at 3.56.26 PM.png
          226 kB
          Utkarsh Agarwal

        Issue Links

          Activity

            People

              utkarsh39 Utkarsh Agarwal
              utkarsh39 Utkarsh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: