Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20923

TaskMetrics._updatedBlockStatuses uses a lot of memory


    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.3.0
    • Component/s: Spark Core
    • Labels:


      The driver appears to use a ton of memory in certain cases to store the task metrics updated block status'. For instance I had a user reading data form hive and caching it. The # of tasks to read was around 62,000, they were using 1000 executors and it ended up caching a couple TB's of data. The driver kept running out of memory.

      I investigated and it looks like there was 5GB of a 10GB heap being used up by the TaskMetrics._updatedBlockStatuses because there are a lot of blocks.

      The updatedBlockStatuses was already removed from the task end event under SPARK-20084. I don't see anything else that seems to be using this. Anybody know if I missed something?

      If its not being used we should remove it, otherwise we need to figure out a better way of doing it so it doesn't use so much memory.


          Issue Links



              • Assignee:
                tgraves Thomas Graves
                tgraves Thomas Graves
              • Votes:
                0 Vote for this issue
                6 Start watching this issue


                • Created: