Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2126

SSH action can be too fast for Oozie sometimes

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.0
    • Component/s: action
    • Labels:
      None

      Description

      We've seen a timing problem with the SSH action where the callback comes back too fast, before the action has transitioned to RUNNING and is still in PREP. This causes Oozie to ignore the callback, which means it won't find out that the action completed until it manually checks (default=10min). This happened in an HA setup, but I think it could happen even without HA. Adding a 30 second delay into the ssh scripts fixed the problem, but ideally we should come up with a better solution.

      Here's the relevant logs:

      2015-01-16 18:00:12,916 INFO org.apache.oozie.action.ssh.SshActionExecutor: SERVER[FOO] USER[foo] GROUP[-] TOKEN[] APP[${job_name}] JOB[0000027-150113223634420-oozie-oozi-W] ACTION[0000027-150113223634420-oozie-oozi-W@action-1] start() begins
      2015-01-16 18:00:12,917 INFO org.apache.oozie.action.ssh.SshActionExecutor: SERVER[FOO] USER[foo] GROUP[-] TOKEN[] APP[${job_name}] JOB[0000027-150113223634420-oozie-oozi-W] ACTION[0000027-150113223634420-oozie-oozi-W@action-1] Attempting to copy ssh base scripts to remote host [foo@bar.com]
      2015-01-16 18:00:15,769 INFO org.apache.oozie.servlet.CallbackServlet: SERVER[FOO] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000027-150113223634420-oozie-oozi-W] ACTION[0000027-150113223634420-oozie-oozi-W@action-1] callback for action [0000027-150113223634420-oozie-oozi-W@action-1]
      2015-01-16 18:00:15,774 ERROR org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[FOO] USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000027-150113223634420-oozie-oozi-W] ACTION[0000027-150113223634420-oozie-oozi-W@action-1] XException,
      org.apache.oozie.command.CommandException: E0800: Action it is not running its in [PREP] state, action [0000027-150113223634420-oozie-oozi-W@action-1]
              at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:77)
              at org.apache.oozie.command.XCommand.call(XCommand.java:251)
              at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      

        Attachments

        1. OOZIE-2126.patch
          17 kB
          Robert Kanter
        2. OOZIE-2126.patch
          17 kB
          Robert Kanter

          Issue Links

            Activity

              People

              • Assignee:
                rkanter Robert Kanter
                Reporter:
                rkanter Robert Kanter
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: