Oozie
  1. Oozie
  2. OOZIE-1550

Create a safeguard to kill errant recursive workflows before they bring down oozie

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.2, 4.0.0
    • Fix Version/s: trunk
    • Component/s: workflow
    • Labels:

      Description

      If a user creates an errant workflow with a sub-workflow that calls the workflow again, without a proper decision node to exit the workflow, it will continue to create numerous jobs until the oozie server is saturated. A user recently had 400,000 running jobs and oozie was non-responsive. I would suggest we have some method of preventing a user from taking out oozie, such as a max running jobs
      parameter.

      1. OOZIE-1550.patch
        9 kB
        Robert Kanter
      2. OOZIE-1550.patch
        9 kB
        Robert Kanter

        Issue Links

          Activity

          Hide
          Robert Kanter added a comment -

          Committed to trunk!

          Show
          Robert Kanter added a comment - Committed to trunk!
          Hide
          Robert Kanter added a comment -

          Test failure unrelated (a port was already in use)

          Show
          Robert Kanter added a comment - Test failure unrelated (a port was already in use)
          Hide
          Hadoop QA added a comment -

          Testing JIRA OOZIE-1550

          Cleaning local svn workspace

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          +1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 132
          . +1 the patch does adds/modifies 2 testcase(s)
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          +1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . +1 the patch does not seem to introduce new javac warnings
          +1 BACKWARDS_COMPATIBILITY
          . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
          . +1 the patch does not modify JPA files
          -1 TESTS
          . Tests run: 1352
          . Tests failed: 1
          . Tests errors: 0

          . The patch failed the following testcases:

          . testConnectionDrop(org.apache.oozie.jms.TestJMSJobEventListener)

          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          -1 Overall result, please check the reported -1(s)

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/oozie-trunk-precommit-build/886/

          Show
          Hadoop QA added a comment - Testing JIRA OOZIE-1550 Cleaning local svn workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN +1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 132 . +1 the patch does adds/modifies 2 testcase(s) +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 BACKWARDS_COMPATIBILITY . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files -1 TESTS . Tests run: 1352 . Tests failed: 1 . Tests errors: 0 . The patch failed the following testcases: . testConnectionDrop(org.apache.oozie.jms.TestJMSJobEventListener) +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- -1 Overall result, please check the reported -1(s) The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/886/
          Hide
          Alejandro Abdelnur added a comment -

          +1 after jenkins

          Show
          Alejandro Abdelnur added a comment - +1 after jenkins
          Hide
          Robert Kanter added a comment -

          Updated patch to rename the method and set the default to 50.

          Show
          Robert Kanter added a comment - Updated patch to rename the method and set the default to 50.
          Hide
          Alejandro Abdelnur added a comment -

          Looks good, +1 after the following comments are addressed and jenkins passes again:

          • The method injectSubworkflowDepth should be verifyAndInjectSubworkflowDepth
          • The default of depth 10 seems too conservative and it may break complex apps, I'd put something like 50
          Show
          Alejandro Abdelnur added a comment - Looks good, +1 after the following comments are addressed and jenkins passes again: The method injectSubworkflowDepth should be verifyAndInjectSubworkflowDepth The default of depth 10 seems too conservative and it may break complex apps, I'd put something like 50
          Hide
          Hadoop QA added a comment -

          Testing JIRA OOZIE-1550

          Cleaning local svn workspace

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          +1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 132
          . +1 the patch does adds/modifies 2 testcase(s)
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          +1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . +1 the patch does not seem to introduce new javac warnings
          +1 BACKWARDS_COMPATIBILITY
          . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
          . +1 the patch does not modify JPA files
          +1 TESTS
          . Tests run: 1345
          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          +1 Overall result, good!, no -1s

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/oozie-trunk-precommit-build/832/

          Show
          Hadoop QA added a comment - Testing JIRA OOZIE-1550 Cleaning local svn workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN +1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 132 . +1 the patch does adds/modifies 2 testcase(s) +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 BACKWARDS_COMPATIBILITY . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files +1 TESTS . Tests run: 1345 +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- +1 Overall result, good!, no -1s The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/832/
          Hide
          Robert Kanter added a comment -

          I created a patch that implements option 2 because (a) its more flexible and (b) it will catch infinite loops involving more than one workflow definition.

          By default, I set it to allow 10 subworkflows to be created; the 11th will cause the subwf action to do the error transition instead of starting the workflow. The code is actually quite simple and just puts a depth counter in each subworkflow.

          Show
          Robert Kanter added a comment - I created a patch that implements option 2 because (a) its more flexible and (b) it will catch infinite loops involving more than one workflow definition. By default, I set it to allow 10 subworkflows to be created; the 11th will cause the subwf action to do the error transition instead of starting the workflow. The code is actually quite simple and just puts a depth counter in each subworkflow.
          Hide
          Mona Chitnis added a comment -

          Option 2 seems more flexible, if there might be users out there wanting to reuse workflows and essentially creating a loop.

          Show
          Mona Chitnis added a comment - Option 2 seems more flexible, if there might be users out there wanting to reuse workflows and essentially creating a loop.
          Hide
          Robert Kanter added a comment -

          We were discussing this and had a couple ideas on ways to prevent this from happening:

          1. Before running the subworkflow, Oozie could check if the subworkflow's path matches the path of any of its ancestors and prevent it from running
            • Some users rely on this functionally to create a loop, so this would have to be disabled by default (i.e. the current behavior), but an oozie-site config could turn it on
          2. Add a limit to the "depth" of subworkflows
            • e.g. if configured to 3, the if wfA calls wfB calls wfC, it won't allow wfC to call wfD, regardless of the workflows themselves

          #1 is the technically more correct way because as-is, it essentially violates the fact that workflows should be DAGs. So I think that's the way to go. Thoughts?

          Show
          Robert Kanter added a comment - We were discussing this and had a couple ideas on ways to prevent this from happening: Before running the subworkflow, Oozie could check if the subworkflow's path matches the path of any of its ancestors and prevent it from running Some users rely on this functionally to create a loop, so this would have to be disabled by default (i.e. the current behavior), but an oozie-site config could turn it on Add a limit to the "depth" of subworkflows e.g. if configured to 3, the if wfA calls wfB calls wfC, it won't allow wfC to call wfD, regardless of the workflows themselves #1 is the technically more correct way because as-is, it essentially violates the fact that workflows should be DAGs. So I think that's the way to go. Thoughts?

            People

            • Assignee:
              Robert Kanter
              Reporter:
              Robert Justice
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development