Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2476

When one of the action from fork fails with transient error, WF never joins.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.0
    • Component/s: None
    • Labels:
      None

      Description

      Noticed multiple time in our production.
      If one the action in fork fail with a transient error ( but succeeded after few retries), they never join.

      This happens when on the action is fork fails to submit a job.
      Oozie queues command as queue(this, retryDelayMillis) on transient error. ActionStartXCommand doesn't load job if its is not null.
      Before ActionStartXCommand runs again, other actions have already started which has modified job info. ActionStartXCommand still contains old info, which writes to DB and we miss some workflow instance data.

        Attachments

          Activity

            People

            • Assignee:
              puru Purshotam Shah
              Reporter:
              puru Purshotam Shah
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: