Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include:
1. add a timeout and stop the dispatcher even if not all events are drained.
2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby.
3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning.
Attachments
Attachments
Issue Links
- is related to
-
YARN-4153 TestAsyncDispatcher failed at branch-2.7
- Resolved