Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Scenario:
1) Run a long running Teragen Job
2) Find out the node where AM has started.
3) Kill nodemanager on AM host using kill -9 command
Expected:
2nd AM should be started and Job should be resumed. Job should also keep running on client side
Actual:
Here, the 1st am was started and then NM running AM was killed. The job wait for around 10 min to start 2nd AM. After that, 2nd AM attempt was started. Just at the same time, job output says that "job failed" and it exited.
Though RM has already started 2nd AM. Gradually 2nd AM runs are job finishes successfully.
Attachments
Attachments
Issue Links
- duplicates
-
TEZ-36 TaskScheduler should not unregister from the RM when sent a reboot
- Closed