Uploaded image for project: 'Sling'
  1. Sling
  2. SLING-5965

Metrics and a Health-Check for Scheduler to detect long-running Quartz-Jobs

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Commons Scheduler 2.5.0
    • Commons Scheduler 2.7.0
    • Commons
    • None

    Description

      Sling Scheduler jobs (aka Quartz-Jobs) should typically be fast running jobs. They are served from a thread-pool and should occupy that thread only for a short amount of time.

      If there are 'misbehaving' quartz-jobs that run for a very long time, they start to occupy threads from that thread-pool, thus have an influence on the performance of other scheduled/quartz-jobs.

      We should have metrics (using sling.commons.metrics) that provide information about internas of Sling Scheduler, such as average, max etc duration of scheduled jobs, as well as how many jobs are currently running and since when was the oldest job running.

      Based on this, a Health-Check can monitor the 'oldest job running' metric and flag critical when eg the oldest job is older than 60'000ms (configurable, default).

      Attachments

        1. SLING-5965.patch
          18 kB
          Stefan Egli
        2. SLING-5965.v2.patch.txt
          10 kB
          Stefan Egli
        3. SLING-5965.v3.patch.txt
          34 kB
          Stefan Egli
        4. SchedulerHealthCheck.jpg
          91 kB
          Stefan Egli
        5. oldestRunningJob.jpg
          102 kB
          Stefan Egli
        6. numRunningJobs.jpg
          59 kB
          Stefan Egli
        7. timers.jpg
          101 kB
          Stefan Egli
        8. SLING-5965.v4.patch.txt
          35 kB
          Stefan Egli
        9. SLING-5965.v5.patch.txt
          56 kB
          Stefan Egli
        10. patch.txt
          20 kB
          Carsten Ziegeler

        Issue Links

          Activity

            People

              stefanegli Stefan Egli
              stefanegli Stefan Egli
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: