If the JT restarts or dies and running jobs are lost or the JT is not reachable, Oozie ActionCheckXCommand will never fail the workflow job.
There seem to be 2 issues here:
- convertException is not receiving the root cause exception anytmore, but alway HadoopAccessorException wrapping the root cause exception. We should modify the convertException to inspect the cause exception as well.
- ActionCheckXCommand does not do the handle retry logic of ActionStartXCommand.