[SPARK-32795] ApplicationInfo#removedExecutors can cause OOM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

In my case, the Standalone Spark master process had a max heap of 1g. 738mb were consumed by these ExecutorDesc objects, the vast majority of which were the 18.5M removedExecutors. This caused the master to OOM and leave the application driver process dangling.

The reason for this is that the worker node ran out of disk space, so for whatever reason decided to go in a fast and endless loop trying to launch new executors and they in turn crashed too. It got up to the 18M before the master just couldn't handle the history anymore.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2020-09-03-23-27-11-809.png
04/Sep/20 04:27
216 kB
Victor Tso

Activity

People

Assignee:: Unassigned

Reporter:: Victor Tso

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/Sep/20 04:26

Updated:: 12/Dec/22 18:10