Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0-alpha
-
None
-
None
Description
This umbrella jira tracks the work needed to preserve critical state information and reload them upon RM restart.
Attachments
Attachments
Issue Links
- depends upon
-
MAPREDUCE-5086 MR app master deletes staging dir when sent a reboot command from the RM
- Closed
- is blocked by
-
MAPREDUCE-5505 Clients should be notified job finished only after job successfully unregistered
- Closed
-
YARN-1082 Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
- Closed
-
YARN-495 Change NM behavior of reboot to resync
- Closed
- is related to
-
MAPREDUCE-5476 Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM
- Closed
- relates to
-
MAPREDUCE-5471 Succeed job tries to restart after RMrestart
- Resolved
-
MAPREDUCE-5127 MR job succeeds and exits even when unregister with RM fails
- Resolved
-
MAPREDUCE-5466 Historyserver does not refresh the result of restarted jobs after RM restart
- Closed
-
YARN-1305 RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException
- Closed
-
MAPREDUCE-5472 reducer of sort job restarts from scratch in between after RM restart
- Resolved
-
MAPREDUCE-5567 [Umbrella] Stabilize MR framework w.r.t ResourceManager restart
- Resolved
-
YARN-209 Capacity scheduler doesn't trigger app-activation after adding nodes
- Closed
-
YARN-479 NM retry behavior for connection to RM should be similar for lost heartbeats
- Closed
-
YARN-149 [Umbrella] ResourceManager (RM) Fail-over
- Resolved
-
YARN-218 Distiguish between "failed" and "killed" app attempts
- Resolved
-
YARN-556 [Umbrella] RM Restart phase 2 - Work preserving restart
- Resolved
-
YARN-1139 [Umbrella] Convert all RM components to Services
- Open
Will be posting a preliminary design sketch this week for comments.