Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2495

change action status from ErrorType.NON_TRANSIENT to TRANSIENT when SSH action occurs AUTH_FAILED occasionally

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 4.2.0
    • None
    • action
    • None
    • Patch

    Description

      For SSH action ,sometimes it failed with the following exception :

      AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 user@XXX.XX.XX.XXX mkdir -p oozie-oozi/0000067-130808155814753-oozie-oozi-W/sshjob--ssh/ ] | EErrorStream: Warning: Permanently added (RSA) to the list of known hosts.

      However , when I execute the same ssh command by hand in Oozie server host , it worked.

      Except incorrect ssh settings , the reason causing the exception may also be SSH client load is too high when connect, network jitter or others.
      Once connect failed, regardless of retry times, oozie will change its status to ErrorType.NON_TRANSIENT and suspend this action right now.
      When it occurs ,I think changing the action status from ErrorType.NON_TRANSIENT to TRANSIENT may be better , this can let action retry automaticly before it be suspended, which can deal with occasionally connect error .

      Attachments

        1. OOZIE-2495.01.patch
          1 kB
          WangMeng
        2. OOZIE-2495.02.patch
          1 kB
          WangMeng

        Activity

          People

            Unassigned Unassigned
            sjtufighter WangMeng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: