Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.9.2
-
None
-
None
Description
We upgraded Zookeeper ensemble from 3.9.1 to 3.9.2. TLS (node-node, client-node) is enabled before upgrade. Everything was working fine before upgrade.
Post upgrade ->
- Stopped everything (all ZK nodes)
- Started all ZK nodes
- Checked if SSL is happening between ZK nodes is fine or not
- Its confirmed that SSL is working fine between ZK nodes.
- Now started just one instance of client application
- Post that we see intermittent successful & unsuccessful handshake messages in ZK logs.
ZK server side, we see below messages:
2024-11-21 13:28:15,586 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.c.X509Util@599] - FIPS mode is ON: selecting standard x509 trust manager com.ibm.jsse2.br@4362299c
2024-11-21 13:28:15,586 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.c.X509Util@644] - Using Java8 optimized cipher suites for Java version 1.8
2024-11-21 13:28:15,588 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory@596] - SSL handler added for channel: [id: 0x2443db1c, L:/10.1.10.50:2181 - R:/10.1.10.46:57272]
2024-11-21 13:28:15,620 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory$CertificateVerifier@415] - Successful handshake with session 0x0
2024-11-21 13:28:15,620 [myid:] - DEBUG [epollEventLoopGroup-4-9:i.n.h.s.SslHandler@1934] - [id: 0x2443db1c, L:/10.1.10.50:2181 - R:/10.1.10.46:57272] HANDSHAKEN: protocol:TLSv1.3 cipher suite:TLS_AES_256_GCM_SHA384
2024-11-21 13:28:15,622 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxnFactory$CnxnChannelHandler@350] - New message PooledUnsafeDirectByteBuf(ridx: 0, widx: 4, cap: 42) from [id: 0x2443db1c, L:/10.1.10.50:2181 - R:/10.1.10.46:57272]
2024-11-21 13:28:15,622 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@368] - 0x0 queuedBuffer: null
2024-11-21 13:28:15,622 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@386] - not throttled
2024-11-21 13:28:15,623 [myid:] - INFO [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@311] - Processing mntr command from /10.1.10.46:57272
2024-11-21 13:28:15,642 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@113] - close called for session id: 0x0
2024-11-21 13:28:15,642 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@131] - close in progress for session id: 0x0
2024-11-21 13:28:15,644 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@113] - close called for session id: 0x0
2024-11-21 13:28:15,644 [myid:] - DEBUG [epollEventLoopGroup-4-9:o.a.z.s.NettyServerCnxn@124] - cnxns size:0
2024-11-21 13:28:17,155 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.c.X509Util@599] - FIPS mode is ON: selecting standard x509 trust manager com.ibm.jsse2.br@a5cca67c
2024-11-21 13:28:17,156 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.c.X509Util@644] - Using Java8 optimized cipher suites for Java version 1.8
2024-11-21 13:28:17,158 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxnFactory@596] - SSL handler added for channel: [id: 0xb818882d, L:/10.1.10.50:2181 - R:/10.1.10.46:57276]
2024-11-21 13:28:17,161 [myid:] - ERROR [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxnFactory$CertificateVerifier@466] - Unsuccessful handshake with session 0x0
2024-11-21 13:28:17,161 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@113] - close called for session id: 0x0
2024-11-21 13:28:17,162 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@124] - cnxns size:0
2024-11-21 13:28:17,163 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@113] - close called for session id: 0x0
2024-11-21 13:28:17,163 [myid:] - DEBUG [epollEventLoopGroup-4-10:o.a.z.s.NettyServerCnxn@124] - cnxns size:0
At client side, we see below message intermittently.
17:37:43.878 [pool-7-thread-1-SendThread(10.1.10.50:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server bdc-dev1807.in.syncsort.dev/10.1.10.50:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$EndOfStreamException: channel for sessionid 0x0 is lost
at org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:287)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1274)
We also see successful SSL connections from client side as well
INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181
Nov 21, 2024 5:46:04 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect
INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181
Nov 21, 2024 5:46:09 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect
INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181
Nov 21, 2024 5:46:09 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect
INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181
Nov 21, 2024 5:46:14 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect
INFO: Connected via SSL to server : 10.1.10.46 @ port : 2181
Nov 21, 2024 5:46:14 PM com.ibm.mailbox.zkwatchdog.ZKCommandClient connect
INFO: Connected via SSL to server : 10.1.10.50 @ port : 2181
We have not set any TLS protocol version or Ciphers at client or server side.
We are using IBM JDK 8.
Please help troubleshoot this issue