Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1813

Add service to report/kill rogue bundles and coordinator jobs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.2.0
    • None
    • None

    Description

      People leave their test coordinator and bundle jobs without ever killing them
      and they just eat up resources heavily. We should have a service which periodically check for abandoned coords and report/kill them.
      We can add multiple logic to this like ( number of consecutive failed/timedout action, total number of failed/timedout action).

      To start with if number of coord action with failed/timedout status > defined value, then coord is considered to be rogue.

      Attachments

        1. OOZIE-1813-V2.patch
          21 kB
          Purshotam Shah
        2. OOZIE-1813-V3.patch
          21 kB
          Purshotam Shah
        3. OOZIE-1813-V4.patch
          21 kB
          Purshotam Shah
        4. OOZIE-1813-V5.patch
          21 kB
          Purshotam Shah
        5. OOZIE-1813-V6.patch
          22 kB
          Purshotam Shah
        6. OOZIE-1813-V7.patch
          22 kB
          Purshotam Shah
        7. OOZIE-1813-V8.patch
          25 kB
          Purshotam Shah
        8. OOZIE-1813-Amendment-V1.patch
          16 kB
          Purshotam Shah
        9. OOZIE-1813-Amendment-V1.patch
          16 kB
          Purshotam Shah

        Issue Links

          Activity

            People

              puru Purshotam Shah
              puru Purshotam Shah
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: