[SPARK-12552] Recovered driver's resource is not counted in the Master - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 2.2.0, 2.3.0
Component/s: Deploy, Spark Core
Labels:
None

Description

Currently in the implementation of Standalone Master HA, if application is submitted as cluster mode, the resource (CPU cores and memory) of driver is not counted again when recovered from failure, which will lead to unexpected behaviors, like more than expected executors, negative core and memory usage in the web UI. Also the recovered application's state is always WAITING, we have to change the state to RUNNING when fully recovered.

Attachments

Issue Links

is duplicated by

SPARK-21169 Spark HA: Jobs state is in WAITING status after reconnecting to standby master

Resolved

SPARK-18554 leader master lost the leadership, when the slave become master, the perivious app's state display as waitting

Resolved

SPARK-20058 the running application status changed from running to waiting when a master is down and it change to another standy by master

Resolved

links to

[Github] Pull Request #10506 (jerryshao)

[Github] Pull Request #18321 (jerryshao)

Activity

People

Assignee:: Saisai Shao

Reporter:: Saisai Shao

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Dec/15 09:39

Updated:: 03/Jul/17 11:15

Resolved:: 14/Jun/17 00:13