Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1879

Workflow Rerun causes error depending on the order of forked nodes

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • trunk
    • 4.1.0
    • core
    • None

    Description

      Suppose you have a workflow like this:

      start --> fork
      fork --> shell1, shell2
      shell1 --> join
      shell2 --> join
      join --> shell3
      shell3 --> end
      

      And all but shell3 are successful.
      Assuming you fix the problem with shell3, if you do a rerun, the following two outcomes can happen:

      1. If shell1 finished before shell2, then the rerun succeeds
      2. If shell2 finished before shell1, then the rerun fails

      The error in the second outcome is simply this log message:

      2014-05-29 17:17:03,735 ERROR org.apache.oozie.workflow.lite.LiteWorkflowInstance: SERVER[cdh5-1.cloudera.local] USER[pdvorak] GROUP[-] TOKEN[] APP[test-rerun-wf] JOB[0000004-140521220856264-oozie-oozi-W] ACTION[0000004-140521220856264-oozie-oozi-W@join] invalid execution path [/shell1/]
      

      After a bunch of digging, I discovered that during a rerun with the above workflow or similar workflows, LiteWorkflowInstance#signal gets called for each action in the fork node in the order that they are listed in the fork node's XML; however, during the original run, LiteWorkflowInstance#signal gets called for each action in the order that they complete (i.e. endTime). When these don't match, you get the above error. The general fix for this is therefore to ensure that during a rerun, LiteWorkflowInstance#signal gets called for each action in the fork node in the order that they originally ran in. And if you think about it, that is more correct than the current behavior anyway.

      Attachments

        1. OOZIE-1879_amendment.patch
          1 kB
          Robert Kanter
        2. OOZIE-1879.patch
          19 kB
          Robert Kanter
        3. OOZIE-1879.patch
          19 kB
          Robert Kanter

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rkanter Robert Kanter
            rkanter Robert Kanter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment