Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1735

Support resuming of failed coordinator job and rerun of a failed coordinator action

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1.0
    • Component/s: None
    • Labels:
      None

      Description

      We should support resuming of failed coordinator job. Job are set to failed if there are runtime error( like SQL timeout).
      In current scenario there is no way to recover beside running SQL.
      Resuming of failed coordinator job should also set pending to 1 ,reset doneMaterialization and last modified to current time. So that materialization continues.

      We should also provide an option of resuming failed action. The behavior will be same as killed option.

        Attachments

        1. OOZIE-1735-V3.patch
          20 kB
          Purshotam Shah
        2. OOZIE-1735-V2.patch
          20 kB
          Purshotam Shah
        3. OOZIE-1735-V2.patch
          20 kB
          Purshotam Shah
        4. OOZIE-1735_v1.patch
          20 kB
          Purshotam Shah

          Issue Links

            Activity

              People

              • Assignee:
                puru Purshotam Shah
                Reporter:
                puru Purshotam Shah
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: