Description
Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot create new containers.
The Slider minicluster test TestKilledAM can replicate this reliably -it kills the AM, then kills a container while the AM is down, which triggers a reallocation of a container, leading to this failure.
Attachments
Attachments
Issue Links
- duplicates
-
YARN-2371 Wrong NMToken is issued when NM preserving restarts with containers running
- Closed
-
YARN-2433 Stale token used by restarted AM (with previous containers retained) to request new container
- Closed
- is depended upon by
-
SLIDER-66 AM Restart not working -YARN issues
- Resolved
- relates to
-
SLIDER-34 Restarted AM cannot create containers
- Resolved
-
FLINK-4142 Recovery problem in HA on Hadoop Yarn 2.4.1
- Closed