Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2219

ZooKeeper server should better handle SessionMovedException.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.5
    • None
    • None
    • None

    Description

      ZooKeeper server should better handle SessionMovedException.
      We hit the SessionMovedException. the following is the reason for the SessionMovedException we find:
      1. ZK client tried to connect to Leader L. Network was very slow, so before leader processed the request, client disconnected.
      2. Client then re-connected to Follower F reusing the same session ID. It was successful.
      3. The request in step 1 went into leader. Leader processed it and invalidated the connection created in step 2. But client didn't know the connection it used is invalidated.
      4. Client got SessionMovedException when it used the connection invalidated by leader for any ZooKeeper operation.

      The following are logs: c045dkh is the Leader, c470udy is the Follower and the sessionID is 0x14be28f50f4419d.
      1. ZK client try to initiate session to Leader at 015-03-16 10:59:40,735 and timeout after 10/3 seconds.
      logs from ZK client

      2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6670ms for sessionid 0x14be28f50f4419d, closing socket connection and attempting reconnect
      015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate using SASL (unknown error)
      2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to c045dkh/?.?.?.67:2181, initiating session
      2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 3336ms for sessionid 0x14be28f50f4419d, closing socket connection and attempting reconnect
      

      2. ZK client initiated session to Follower successfully at 2015-03-16 10:59:44,688
      logs from ZK client

      2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate using SASL (unknown error)
      2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to c470udy/?.?.?.65:2181, initiating session
      2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server c470udy/?.?.?.65:2181, sessionid = 0x14be28f50f4419d, negotiated timeout = 10000
      

      logs from ZK Follower server

      2015-03-16 10:59:44,673 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /?.?.?.65:42777
      2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
      2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner: Revalidating client: 0x14be28f50f4419d
      2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14be28f50f4419d with negotiated timeout 10000 for client /?.?.?.65:42777
      

      3. At 2015-03-16 10:59:45,668, Leader processed the delayed request which is sent by Client at 2015-03-16 10:59:40,735, right after session was established, it close the socket connection/session because client was already disconnected due to timeout.
      logs from ZK Leader server

      2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
      2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14be28f50f4419d with negotiated timeout 10000 for client /?.?.?.65:50271
      2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x14be28f50f4419d due to java.io.IOException: Broken pipe
      2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /?.?.?.65:50271 which had sessionid 0x14be28f50f4419d
      

      4. Client got SessionMovedException at 2015-03-16 10:59:45,693
      logs from ZK Leader server

      2015-03-16 10:59:45,693 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
      2015-03-16 10:59:45,695 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
      2015-03-16 10:59:45,700 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
      

      5. At 2015-03-16 10:59:45,710, we close the session 0x14be28f50f4419d but the socket connection between ZK client and ZK Follower is closed at 2015-03-16 10:59:45,715 after session termination.
      logs from ZK Leader server:

      2015-03-16 10:59:45,710 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x14be28f50f4419d
      

      logs from ZK Follower server:

      2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /?.?.?.65:42777 which had sessionid 0x14be28f50f4419d
      

      It looks like Zk client is out-of-sync with ZK server.
      My question is how ZK client can recover from this error. It looks like the ZK Client won't be disconnected from Follower until session is closed and any ZooKeeper operation will fail with SessionMovedException before session is closed.
      Also since ZK Leader already closed the socket connection/session to ZK Client at step 3, why it still reject the ZooKeeper operation from client with SessionMovedException. Will it be better to endorse the session/connection between ZK client and ZK Follower? This seems like a bug to me.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zxu Zhihai Xu
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: