[SPARK-11022] Spark Worker need improve the executor garbage while the app has massive failures - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 1.4.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed

Description

Worker process often down,while there were not any abnormal tasks，just crash without anymessage， after added "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) bytes. "
and almost all the instance were stored in a "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field hold many ExecutorRunner.

The codes(Worker.scala) shows finishedExecutors just "finishedExecutors(fullId) = executor" and "finishedExecutors.values.toList",there is no action which remove the Executor,all were stored in memory,so after receive many executors status report,may cause crash,I think this need improved.
tks~ & best regards

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: colin shaw

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/Oct/15 07:18

Updated:: 21/May/19 04:33

Resolved:: 21/May/19 04:33