|
|
|
YARN-3019
|
YARN-556
Make work-preserving-recovery the default mechanism for RM recovery
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2994
|
YARN-556
Document work-preserving RM restart
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2822
|
YARN-556
NPE when RM tries to transfer state from previous attempt on recovery
|
Jian He
|
Jian He
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
|
YARN-2712
|
YARN-556
TestWorkPreservingRMRestart: Augment FS tests with queue and headroom checks
|
Tsuyoshi Ozawa
|
Tsuyoshi Ozawa
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2674
|
YARN-556
Distributed shell AM may re-launch containers if RM work preserving restart happens
|
Shane Kumpf
|
Chun Chen
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
|
YARN-2567
|
YARN-556
Add a percentage-node threshold for RM to wait for new allocations after restart/failover
|
Vinod Kumar Vavilapalli
|
Vinod Kumar Vavilapalli
|
|
Open |
Unresolved
|
|
|
|
|
|
|
|
YARN-2558
|
YARN-556
Updating ContainerTokenIdentifier#read/write to use ContainerId#getContainerId
|
Tsuyoshi Ozawa
|
Tsuyoshi Ozawa
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2515
|
YARN-556
Update ConverterUtils#toContainerId to parse epoch
|
Tsuyoshi Ozawa
|
Tsuyoshi Ozawa
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2456
|
YARN-556
Possible livelock in CapacityScheduler when RM is recovering apps
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2434
|
YARN-556
RM should not recover containers from previously failed attempt when AM restart is not enabled
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2312
|
YARN-556
Marking ContainerId#getId as deprecated
|
Tsuyoshi Ozawa
|
Tsuyoshi Ozawa
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2260
|
YARN-556
Add containers to launchedContainers list in RMNode on container recovery
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2249
|
YARN-556
AM release request may be lost on RM restart
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2229
|
YARN-556
ContainerId can overflow with RM restart
|
Tsuyoshi Ozawa
|
Tsuyoshi Ozawa
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2182
|
YARN-556
Update ContainerId#toString() to avoid conflicts before and after RM restart
|
Tsuyoshi Ozawa
|
Tsuyoshi Ozawa
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2153
|
YARN-556
Ensure distributed shell work with RM work-preserving recovery
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2152
|
YARN-556
Recover missing container information
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2115
|
YARN-556
Replace RegisterNodeManagerRequest#ContainerStatus with a new NMContainerStatus
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2052
|
YARN-556
ContainerId creation after work preserving restart is broken
|
Tsuyoshi Ozawa
|
Tsuyoshi Ozawa
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2017
|
YARN-556
Merge some of the common lib code in schedulers
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2001
|
YARN-556
Threshold for RM to accept requests from AM after failover
|
Jian He
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-2000
|
YARN-556
Fix ordering of starting services inside the RM
|
Jian He
|
Jian He
|
|
Resolved |
Invalid
|
|
|
|
|
|
|
|
YARN-1879
|
YARN-556
Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
|
Tsuyoshi Ozawa
|
Jian He
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-1823
|
YARN-556
Recover Unmanaged AMs
|
Anubhav Dhoot
|
Karthik Kambatla
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
|
YARN-1822
|
YARN-556
Revisit AM link being broken for work preserving restart
|
Unassigned
|
Robert Kanter
|
|
Resolved |
Invalid
|
|
|
|
|
|
|
|
YARN-1373
|
YARN-556
Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps
|
Omkar Vinit Joshi
|
Bikas Saha
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
|
YARN-1372
|
YARN-556
Ensure all completed containers are reported to the AMs across RM restart
|
Anubhav Dhoot
|
Bikas Saha
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-1371
|
YARN-556
FIFO scheduler to re-populate container allocation state
|
Jian He
|
Bikas Saha
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
|
YARN-1370
|
YARN-556
Fair scheduler to re-populate container allocation state
|
Anubhav Dhoot
|
Bikas Saha
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-1369
|
YARN-556
Capacity scheduler to re-populate container allocation state
|
Jian He
|
Bikas Saha
|
|
Resolved |
Duplicate
|
|
|
|
|
|
|
|
YARN-1368
|
YARN-556
Common work to re-populateĀ containersā state into scheduler
|
Jian He
|
Bikas Saha
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-1367
|
YARN-556
After restart NM should resync with the RM without killing containers
|
Anubhav Dhoot
|
Bikas Saha
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-1366
|
YARN-556
AM should implement Resync with the ApplicationMasterService instead of shutting down
|
Rohith Sharma K S
|
Bikas Saha
|
|
Closed |
Fixed
|
|
|
|
|
|
|
|
YARN-1365
|
YARN-556
ApplicationMasterService to allow Register of an app that was running before restart
|
Anubhav Dhoot
|
Bikas Saha
|
|
Closed |
Fixed
|
|
|
|
|