Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-534

Improve fault-tolerance of DemuxManager, PostProcessManager and ChukwaArchiveManager.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      If the any of these processes receives more than N consecutive errors, it dies with the message "Too many errors, Bail out!".

      Let's change to this introduce a configurable number of concurrent exceptions to be encountered before dying. If the value is set to -1, expected behavior is to keep retrying ad infinitum.

      Also as part if this bug is to improve logging of how many consecutive errors have occurred, as well as the time they started. A possible future enhancement could be to support an error time threshold as well as an absolute count.

      Suggesting the following new config setting. It's a bit verbose, but it's clear.

      demux.max.error.count.before.shutdown
      post.process.max.error.count.before.shutdown
      archive.max.error.count.before.shutdown
      

        Attachments

        1. CHUKWA-534_1.patch
          4 kB
          Bill Graham
        2. CHUKWA-534_2.patch
          5 kB
          Bill Graham
        3. CHUKWA-534_3.patch
          11 kB
          Bill Graham

          Activity

            People

            • Assignee:
              billgraham Bill Graham
              Reporter:
              billgraham Bill Graham
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: