Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9640

RPC Congestion Control with FairCallQueue

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0, 3.0.0-alpha1
    • None
    • None
    • Enable optional RPC-level priority to combat congestion and make request latencies more consistent.


      For an easy-to-read summary see: http://www.ebaytechblog.com/2014/08/21/quality-of-service-in-hadoop/

      Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond.

      We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a Fair Call Queue. (this plan supersedes rpc-congestion-control-draft-plan).

      Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.”

      Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.”

      Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.”


        1. rpc-congestion-control-draft-plan.pdf
          488 kB
          Xiaobo Peng
        2. faircallqueue.patch
          41 kB
          Chris Li
        3. NN-denial-of-service-updated-plan.pdf
          2.76 MB
          Chris Li
        4. MinorityMajorityPerformance.pdf
          72 kB
          Chris Li
        5. faircallqueue2.patch
          73 kB
          Chris Li
        6. faircallqueue3.patch
          73 kB
          Chris Li
        7. faircallqueue4.patch
          74 kB
          Chris Li
        8. faircallqueue5.patch
          73 kB
          Chris Li
        9. faircallqueue6.patch
          74 kB
          Chris Li
        10. faircallqueue7_with_runtime_swapping.patch
          134 kB
          Chris Li
        11. FairCallQueue-PerformanceOnCluster.pdf
          694 kB
          Chris Li

        Issue Links


          This comment will be Viewable by All Users Viewable by All Users


            chrilisf Chris Li
            teledriver Xiaobo Peng
            5 Vote for this issue
            91 Start watching this issue




                Issue deployment