Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
trunk
-
None
-
None
Description
Reported by mchiang.
Scenario: (2 Oozie HA servers)
21:38:56 submit job at oozie client
21:41:42 shut down server1
21:46:52 shut down server2
21:47:30 start server1
22:15:05 start server2
the last fork path end time is 21:52:53.
22:36:48 the job is still RUNNING, not moving to join node.
Digging into the logs, the locking part seems to work fine with forked action processing distributed amongst the two servers when both running or when one of them is down. The issue seems to be why even RecoveryService fails to pick up the job after all the forks had completed