Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-994

ActionCheckXCommand does not handle failures properly

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.3.0
    • Component/s: workflow
    • Labels:
      None

      Description

      If the JT restarts or dies and running jobs are lost or the JT is not reachable, Oozie ActionCheckXCommand will never fail the workflow job.

      There seem to be 2 issues here:

      • convertException is not receiving the root cause exception anytmore, but alway HadoopAccessorException wrapping the root cause exception. We should modify the convertException to inspect the cause exception as well.
      • ActionCheckXCommand does not do the handle retry logic of ActionStartXCommand.

        Attachments

        1. OOZIE-994.patch
          28 kB
          Robert Kanter
        2. OOZIE-994.patch
          28 kB
          Robert Kanter
        3. OOZIE-994.patch
          27 kB
          Robert Kanter
        4. OOZIE-994.patch
          26 kB
          Robert Kanter
        5. OOZIE-994.patch
          22 kB
          Robert Kanter
        6. OOZIE-994.patch
          22 kB
          Robert Kanter
        7. OOZIE-994.patch
          22 kB
          Robert Kanter

          Issue Links

            Activity

              People

              • Assignee:
                rkanter Robert Kanter
                Reporter:
                tucu00 Alejandro Abdelnur
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: