Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2476

When one of the action from fork fails with transient error, WF never joins.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.3.0
    • None
    • None

    Description

      Noticed multiple time in our production.
      If one the action in fork fail with a transient error ( but succeeded after few retries), they never join.

      This happens when on the action is fork fails to submit a job.
      Oozie queues command as queue(this, retryDelayMillis) on transient error. ActionStartXCommand doesn't load job if its is not null.
      Before ActionStartXCommand runs again, other actions have already started which has modified job info. ActionStartXCommand still contains old info, which writes to DB and we miss some workflow instance data.

      Attachments

        1. OOZIE-2476-V1.patch
          2 kB
          Purshotam Shah

        Activity

          People

            puru Purshotam Shah
            puru Purshotam Shah
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: