Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1813

Add service to report/kill rogue bundles and coordinator jobs

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.0
    • Component/s: None
    • Labels:
      None

      Description

      People leave their test coordinator and bundle jobs without ever killing them
      and they just eat up resources heavily. We should have a service which periodically check for abandoned coords and report/kill them.
      We can add multiple logic to this like ( number of consecutive failed/timedout action, total number of failed/timedout action).

      To start with if number of coord action with failed/timedout status > defined value, then coord is considered to be rogue.

        Attachments

        1. OOZIE-1813-V2.patch
          21 kB
          Purshotam Shah
        2. OOZIE-1813-V3.patch
          21 kB
          Purshotam Shah
        3. OOZIE-1813-V4.patch
          21 kB
          Purshotam Shah
        4. OOZIE-1813-V5.patch
          21 kB
          Purshotam Shah
        5. OOZIE-1813-V6.patch
          22 kB
          Purshotam Shah
        6. OOZIE-1813-V7.patch
          22 kB
          Purshotam Shah
        7. OOZIE-1813-V8.patch
          25 kB
          Purshotam Shah
        8. OOZIE-1813-Amendment-V1.patch
          16 kB
          Purshotam Shah
        9. OOZIE-1813-Amendment-V1.patch
          16 kB
          Purshotam Shah

          Issue Links

            Activity

              People

              • Assignee:
                puru Purshotam Shah
                Reporter:
                puru Purshotam Shah
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: