Description
Steps:
1) Launch httpd-docker
2) Wait for app to be in STABLE state
3) Run validation for app (It takes around 3 mins)
4) Stop all Zks
5) Wait 60 sec
6) Kill AM
7) wait for 30 sec
8) Start all ZKs
9) Wait for application to finish
10) Validate expected containers of the app
Expected behavior:
New attempt of AM should start and docker containers launched by 1st attempt should be recovered by new attempt.
Actual behavior:
New AM attempt starts. It can not recover 1st attempt docker containers. It can not read component details from ZK.
Thus, it starts new attempt for all containers.
2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering appattempt_1531977563978_0015_000002, fault-test-zkrm-httpd-docker into registry 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 containers from previous attempt. 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not read component paths: `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': No such file or directory: KeeperErrorCode = NoNode for /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling container_e08_1531977563978_0015_01_000003 from previous attempt 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not found in registry for container container_e08_1531977563978_0015_01_000003 from previous attempt, releasing 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering initial evaluation of component httpd 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT httpd]: 2 instances. 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] Requesting for 2 container(s)
Attachments
Attachments
Issue Links
- is related to
-
YARN-6168 Restarted RM may not inform AM about all existing containers
- Resolved