Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
3.4.6
-
Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
Java Version: jdk64/jdk1.8.0_40
zookeeper version: 3.4.6.2.3.2.0-2950
Description
QuorumCnxManager Listener thread blocks SocketServer on accept but we are getting SocketTimeoutException on our boxes after 49days 17 hours . As per current code there is a 3 times retry and after that it says "As I'm leaving the listener thread, I won't be able to participate in leader election any longer: $<hostname>/$<ip>:3888_" , Once server nodes reache this state and we restart or add a new node ,it fails to join cluster and logs 'WARN QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open channel to 3 at election address $<hostname>/$<ip>:3888' .
As there is no timeout specified for ServerSocket it should never timeout but there are some already discussed issues where people have seen this issue and added checks for SocketTimeoutException explicitly like https://issues.apache.org/jira/browse/KARAF-3325 .
I think we need to handle SocketTimeoutException on similar lines for zookeeper as well