Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2836

QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 3.4.6
    • Fix Version/s: None
    • Component/s: leaderElection, quorum
    • Environment:

      Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
      Java Version: jdk64/jdk1.8.0_40
      zookeeper version: 3.4.6.2.3.2.0-2950

      Description

      QuorumCnxManager Listener thread blocks SocketServer on accept but we are getting SocketTimeoutException on our boxes after 49days 17 hours . As per current code there is a 3 times retry and after that it says "As I'm leaving the listener thread, I won't be able to participate in leader election any longer: $<hostname>/$<ip>:3888_" , Once server nodes reache this state and we restart or add a new node ,it fails to join cluster and logs 'WARN QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open channel to 3 at election address $<hostname>/$<ip>:3888' .

      As there is no timeout specified for ServerSocket it should never timeout but there are some already discussed issues where people have seen this issue and added checks for SocketTimeoutException explicitly like https://issues.apache.org/jira/browse/KARAF-3325 .

      I think we need to handle SocketTimeoutException on similar lines for zookeeper as well

        Attachments

          Activity

            People

            • Assignee:
              gaoshu gaoshu
              Reporter:
              amarjeet_singh Amarjeet Singh
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m