Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4440

Zookeeper Upgrade failed when disabling Plain-text communication and ensemble failed to form

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.7.0
    • None
    • server
    • None
    • Kubernetes 1.21.1

    Description

      We have three(3) node zookeeper cluster running as a pod on Kubernetes cluster,
      Zookeeper version is 3.7.0, While upgrading zookeeper from Plain-text+Secure mode to only secure mode we are facing issue( i.e. disabling Plain-Text channel)

      1. To disable plain-text we are removing <clientport> from the dynamic configuration file to enable only secure communication but after upgrade zookeeper ensemble failed to form. leader election continuous failing and getting notification timeout

      #server configuration
      server.1=server1zookeeper.svc.cluster.local:2888:3888:participant
      server.2=server2zookeeper.svc.cluster.local:2888:3888:participant
      server.3=server3zookeeper.svc.cluster.local:2888:3888:participant
      
      #secure port enabled
      secureClientPort=2281
      
       
      2021-05-19T08:00:06.900+0000 [myid:] - WARN [QuorumConnectionThread-[myid=3]-3:QuorumCnxManager@400] - Cannot open channel to 1 at election address server1zookeeper/192.168.57.156:3888 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[?:?] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[?:?] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[?:?] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?] at java.net.Socket.connect(Socket.java:609) ~[?:?] at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383) [zookeeper-3.7.0.jar:3.7.0] at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457) [zookeeper-3.7.0.jar:3.7.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?]
      
      2021-05-19T07:47:56.894+0000 [myid:] - INFO  [QuorumPeer[myid=1](plain=disabled)(secure=0.0.0.0:2281):FastLeaderElection@979] - Notification time out: 60000
      
      

      2. We also tried to perform reconfiguration from CLI using zkCli.sh but this also not working, we tried to use "reconfig -member" and provided servers details but zookeeper ensemble not updating and getting error. created DigestAuthenticationProvider user to allow reconfig

      [zk: zookeeper:2281(CONNECTED) 0]
      [zk: zookeeper:2281(CONNECTED) 0]
      [zk: zookeeper:2281(CONNECTED) 0] config
      server.1=server1zookeeper.svc.cluster.local:2888:3888:participant
      server.2=server2zookeeper.svc.cluster.local:2888:3888:participant
      server.3=server3zookeeper.svc.cluster.local:2888:3888:participant
      version=1700000000
      [zk: zookeeper:2281(CONNECTED) 1]
      [zk: zookeeper:2281(CONNECTED) 1]
      [zk: zookeeper:2281(CONNECTED) 1] addauth digest zookeeper:admin
      [zk: zookeeper:2281(CONNECTED) 2]
      [zk: zookeeper:2281(CONNECTED) 2]
      [zk: zookeeper:2281(CONNECTED) 2] reconfig -members server.1=server1zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181,server.2=server2zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181,server.3=server3zookeeper.svc.cluster.local:2888:3888:participant;0.0.0.0:2181
      2021-05-19T08:16:43.376+0000 [myid:zookeeper:2281] - WARN  [main-SendThread(zookeeper:2281):ClientCnxn$SendThread@1242] - Client session timed out, have not heard from server in 20000ms for session id 0x30169d99fdf0000
      2021-05-19T08:16:43.377+0000 [myid:zookeeper:2281] - WARN  [main-SendThread(zookeeper:2281):ClientCnxn$SendThread@1285] - Session 0x30169d99fdf0000 for sever zookeeper/10.107.240.229:2281, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
      org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 20000ms for session id 0x30169d99fdf0000
              at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243) [zookeeper-3.7.0.jar:3.7.0]WATCHER::WatchedEvent state:Disconnected type:None path:null
      2021-05-19T08:16:43.390+0000 [myid:] - INFO  [nioEventLoopGroup-2-1:ClientCnxnSocketNetty$ZKClientHandler@469] - channel is disconnected: [id: 0xa97b55e0, L:/192.168.220.12:47114 ! R:zookeeper/10.107.240.229:2281]
      2021-05-19T08:16:43.392+0000 [myid:] - INFO  [nioEventLoopGroup-2-1:ClientCnxnSocketNetty@249] - channel is told closing
      KeeperErrorCode = ConnectionLoss
      

      Kindly suggest the way to perform upgrade with desire changes and should also work with rollback.

      Attachments

        Activity

          People

            Unassigned Unassigned
            anoopnegi Anoop Negi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: