The driver appears to use a ton of memory in certain cases to store the task metrics updated block status'. For instance I had a user reading data form hive and caching it. The # of tasks to read was around 62,000, they were using 1000 executors and it ended up caching a couple TB's of data. The driver kept running out of memory.
I investigated and it looks like there was 5GB of a 10GB heap being used up by the TaskMetrics._updatedBlockStatuses because there are a lot of blocks.
The updatedBlockStatuses was already removed from the task end event under
SPARK-20084. I don't see anything else that seems to be using this. Anybody know if I missed something?
If its not being used we should remove it, otherwise we need to figure out a better way of doing it so it doesn't use so much memory.