[SPARK-12447] Only update AM's internal state when executor is successfully launched by NM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 2.0.0
Component/s: Spark Core, YARN
Labels:
None

Description

Currently YarnAllocator will update its managed states like numExecutorsRunning after container is allocated but before executor are successfully launched.

This happened when Spark configuration is wrong (like spark_shuffle aux-service is not configured in NM occasionally), which makes executor fail to launch, or NM lost when NMClient is communicated.

In the current implementation, state will also be updated even executor is failed to launch, this will lead to incorrect state of AM. Also lingering container will only be release after timeout, this will introduce resource waste.

So here we should update the states only after executor is correctly launched, otherwise we should release container ASAP to make it fail fast and retry.

Attachments

Issue Links

breaks

SPARK-21383 YARN can allocate too many executors

Resolved

links to

[Github] Pull Request #10412 (jerryshao)

Activity

People

Assignee:: Saisai Shao

Reporter:: Saisai Shao

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Dec/15 10:18

Updated:: 17/May/20 18:17

Resolved:: 10/Jun/16 00:32