Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1879

Workflow Rerun causes error depending on the order of forked nodes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • trunk
    • 4.1.0
    • core
    • None

    Description

      Suppose you have a workflow like this:

      start --> fork
      fork --> shell1, shell2
      shell1 --> join
      shell2 --> join
      join --> shell3
      shell3 --> end
      

      And all but shell3 are successful.
      Assuming you fix the problem with shell3, if you do a rerun, the following two outcomes can happen:

      1. If shell1 finished before shell2, then the rerun succeeds
      2. If shell2 finished before shell1, then the rerun fails

      The error in the second outcome is simply this log message:

      2014-05-29 17:17:03,735 ERROR org.apache.oozie.workflow.lite.LiteWorkflowInstance: SERVER[cdh5-1.cloudera.local] USER[pdvorak] GROUP[-] TOKEN[] APP[test-rerun-wf] JOB[0000004-140521220856264-oozie-oozi-W] ACTION[0000004-140521220856264-oozie-oozi-W@join] invalid execution path [/shell1/]
      

      After a bunch of digging, I discovered that during a rerun with the above workflow or similar workflows, LiteWorkflowInstance#signal gets called for each action in the fork node in the order that they are listed in the fork node's XML; however, during the original run, LiteWorkflowInstance#signal gets called for each action in the order that they complete (i.e. endTime). When these don't match, you get the above error. The general fix for this is therefore to ensure that during a rerun, LiteWorkflowInstance#signal gets called for each action in the fork node in the order that they originally ran in. And if you think about it, that is more correct than the current behavior anyway.

      Attachments

        1. OOZIE-1879.patch
          19 kB
          Robert Kanter
        2. OOZIE-1879.patch
          19 kB
          Robert Kanter
        3. OOZIE-1879_amendment.patch
          1 kB
          Robert Kanter

        Issue Links

          Activity

            People

              rkanter Robert Kanter
              rkanter Robert Kanter
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: