Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1205

If the JobTracker is restarted during a Fork, Oozie doesn't fail all of the currently running actions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • trunk
    • 4.0.0
    • action
    • None

    Description

      If you have a workflow with a fork and restart the JobTracker while its executing the paths in the fork, those two jobs will be lost (as expected). Once the timeout occurs on the ActionCheckXCommand, it will check both actions sequentially. While checking the first action, it sets the status to FAILED and also sets the workflow's status to FAILED. It then moves on to the other action that was running concurrently, but it cannot pass the precondition check because the workflow was already FAILED (the check requires that the Workflow is RUNNING). It will keep trying this every time the timeout hits (10min is default) and print a WARN message in the log. That action will also be in RUNNING state forever even though the underlying job isn't running and the WF is FAILED.

      Attachments

        1. OOZIE-1205.patch
          3 kB
          Robert Kanter
        2. OOZIE-1205.patch
          22 kB
          Robert Kanter

        Activity

          People

            rkanter Robert Kanter
            rkanter Robert Kanter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: