Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-3242

Add server side connecting throttling

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 3.6.0
    • server

    Description

      On-going performance investigation at Facebook has demonstrated that Zookeeper is easily overwhelmed by spikes in connection rates and/or write request rates. Zookeeper performance gets progressively worse, clients timeout and try to reconnect (exacerbating the problem) and things enter a death spiral. To solve this problem, we need to add load protection to Zookeeper via rate limiting and work shedding.
       
      This Jira adds a new connection rate limiting mechanism to Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during connection spikes. 
      The new throttle is focused on limiting connections per second. The throttle is implemented as a token-bucket with optional probabilistic dropping based on the BLUE queue management algorithm.
       
      This token-bucket design allows the throttle to allow short bursts to pass, while still capping the total number of requests per second. However, an issue with a token bucket approach is that the wall clock arrival time of requests affects the probability of a request being allowed to pass or not. Under constant load this can lead to request starvation for requests that constantly arrive later than the majority. The optional probabilistic dropping mechanism is designed to combat this, making rejections a random event with little skew based on arrival time.
       
      A more verbose description can be found in the comments in org.apache.zookeeper.server.BlueThrottle.
       
      By default, both the token-bucket and probabilistic dropping mechanism are disabled. Enabling and tuning the throttles can be done both via Java system properties as well as against a running node via JMX.
       
      The throttle has been tested and benchmarked at Facebook.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jiehuangjie Jie Huang
            jiehuangjie Jie Huang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 10m
                3h 10m

                Slack

                  Issue deployment