Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-487

Collector left in a bad state after temprorary NN outage

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: None
    • Component/s: Data Collection
    • Labels:
      None

      Description

      When the name node returns errors to the collector, at some point the collector dies half way. This behavior should be changed to either resemble the agents and keep trying, or to completely shutdown. Instead, what I'm seeing is that the collector logs that it's shutting down, and the var/pidDir/Collector.pid file gets removed, but the collector continues to run, albeit not handling new data. Instead, this log entry is repeated ad infinitum:

      2010-05-06 17:35:06,375 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
      2010-05-06 17:36:06,379 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
      2010-05-06 17:37:06,384 INFO Timer-1 root - stats:ServletCollector,numberHTTPConnection:0,numberchunks:0

        Attachments

        1. CHUKWA-487.patch
          2 kB
          Ari Rabkin
        2. CHUKWA-487.threaddump.txt
          257 kB
          Bill Graham

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              billgraham Bill Graham
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: