Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
4.2.0
-
None
-
None
-
Patch
Description
For SSH action ,sometimes it failed with the following exception :
AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 user@XXX.XX.XX.XXX mkdir -p oozie-oozi/0000067-130808155814753-oozie-oozi-W/sshjob--ssh/ ] | EErrorStream: Warning: Permanently added (RSA) to the list of known hosts.
However , when I execute the same ssh command by hand in Oozie server host , it worked.
Except incorrect ssh settings , the reason causing the exception may also be SSH client load is too high when connect, network jitter or others.
Once connect failed, regardless of retry times, oozie will change its status to ErrorType.NON_TRANSIENT and suspend this action right now.
When it occurs ,I think changing the action status from ErrorType.NON_TRANSIENT to TRANSIENT may be better , this can let action retry automaticly before it be suspended, which can deal with occasionally connect error .