Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4805

Remove black list feature from Chukwa Agent to Chukwa Collector communication

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Redhat EL 5, Java 6

    • Hadoop Flags:
      Reviewed

      Description

      Recently, new load balance algorithm was added to improve chukwa agent to chukwa collector communication. The design was to send one HTTP POST per collector, and rotate through the list of collector to load balance the collectors. When a collector fail to respond, the collector is black listed for 5 minutes. If all collectors are not responding, sleep for random 1-5 minutes. Unfortunately, this algorithm produced problem for slower machines. The slower machines end up black list all collectors and sleep indefinitely. This ticket is to restore the algorithm to the original design. The agent will shuffle the collector list. The agent will try it's best effort to make HTTP POST to the same collector until error occurs, then it will iterate through the list of random collectors.

        Issue Links

          Activity

          Hide
          eyang Eric Yang added a comment -

          Restore chukwa agent HTTP Sender back one revision.

          Show
          eyang Eric Yang added a comment - Restore chukwa agent HTTP Sender back one revision.
          Hide
          jboulon Jerome Boulon added a comment -

          This will revert to the original design. +1

          Show
          jboulon Jerome Boulon added a comment - This will revert to the original design. +1
          Hide
          asrabkin Ari Rabkin added a comment -

          I always preferred detecting and correcting collector congestion on the collector side. We might also have collectors start spitting back HTTP errors if they cross some specified load average.
          +1 to patch.

          Show
          asrabkin Ari Rabkin added a comment - I always preferred detecting and correcting collector congestion on the collector side. We might also have collectors start spitting back HTTP errors if they cross some specified load average. +1 to patch.
          Hide
          chris.douglas Chris Douglas added a comment -

          I just committed this. Thanks, Eric

          Show
          chris.douglas Chris Douglas added a comment - I just committed this. Thanks, Eric
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-trunk #684 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/684/)
          . Remove black list collector from Chukwa Agent HTTP Sender. Contributed by Eric Yang.

          Show
          hudson Hudson added a comment - Integrated in Hadoop-trunk #684 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/684/ ) . Remove black list collector from Chukwa Agent HTTP Sender. Contributed by Eric Yang.
          Hide
          chansler Robert Chansler added a comment -

          No release note for "just a bug."

          Show
          chansler Robert Chansler added a comment - No release note for "just a bug."

            People

            • Assignee:
              eyang Eric Yang
              Reporter:
              eyang Eric Yang
            • Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development