ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-803

Improve defenses against misbehaving clients

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This issue is in response to ZOOKEEPER-801. Short version is a small number of buggy clients opened thousands of connections and caused Zookeeper to fail.

      The misbehaving client did not correctly handle expired sessions, creating a new connection each time. The huge number of connections exacerbated the issue.

        Issue Links

          Activity

          Hide
          Travis Crawford added a comment -

          This diff shows a bug where the client developer confused disconnections and expired sessions. In the zookeeper programing model, clients reconnect themselves automatically when disconnected. However, should the session expire the application is responsible for reconnecting.

          In this case the developer attempted to throttle reconnects, however, due to a bug the application created a new connection each time.

          A small number of clients running the buggy code took down a 3 node Zookeeper cluster by exhausting 65k file descriptor limit. It only recovered after shutting down clients, restarting the Zookeepers, and then restarting the well-behaved clients.

          Show
          Travis Crawford added a comment - This diff shows a bug where the client developer confused disconnections and expired sessions. In the zookeeper programing model, clients reconnect themselves automatically when disconnected. However, should the session expire the application is responsible for reconnecting. In this case the developer attempted to throttle reconnects, however, due to a bug the application created a new connection each time. A small number of clients running the buggy code took down a 3 node Zookeeper cluster by exhausting 65k file descriptor limit. It only recovered after shutting down clients, restarting the Zookeepers, and then restarting the well-behaved clients.
          Hide
          Patrick Hunt added a comment -

          thanks for this - approx how many clients are we talking about?

          Show
          Patrick Hunt added a comment - thanks for this - approx how many clients are we talking about?
          Hide
          Travis Crawford added a comment -

          Maybe 8-10 clients were running the buggy code. Not too many.

          Show
          Travis Crawford added a comment - Maybe 8-10 clients were running the buggy code. Not too many.

            People

            • Assignee:
              Unassigned
              Reporter:
              Travis Crawford
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development