HBase
  1. HBase
  2. HBASE-10283

Client can't connect with all the running zk servers in MiniZooKeeperCluster

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.94.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving.

      It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer.

      2014-01-03 12:06:58,625 INFO  [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227
      ......
      2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227
      2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228
      2014-01-03 12:06:59,366 INFO  [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
      (then it throws exceptions......)
      

      The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers.

      Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name.

      MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut.

        Activity

        chendihao created issue -
        chendihao made changes -
        Field Original Value New Value
        Description Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving.

        It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer.
        {noformat}
        2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227
        ......
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228
        2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
        {noformat}

        The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers.

        Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self use the port and ZKConfig will ignore the other two servers which have the same host name.

        MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should.
        Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving.

        It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer.
        {noformat}
        2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227
        ......
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228
        2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
        (then it throws exceptions......)
        {noformat}

        The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers.

        Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self use the port and ZKConfig will ignore the other two servers which have the same host name.

        MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should.
        chendihao made changes -
        Description Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving.

        It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer.
        {noformat}
        2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227
        ......
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228
        2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
        (then it throws exceptions......)
        {noformat}

        The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers.

        Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self use the port and ZKConfig will ignore the other two servers which have the same host name.

        MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should.
        Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving.

        It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer.
        {noformat}
        2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227
        ......
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228
        2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
        (then it throws exceptions......)
        {noformat}

        The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers.

        Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name.

        MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should.
        chendihao made changes -
        Description Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving.

        It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer.
        {noformat}
        2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227
        ......
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228
        2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
        (then it throws exceptions......)
        {noformat}

        The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers.

        Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name.

        MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. But apparently we should.
        Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving.

        It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer.
        {noformat}
        2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227
        ......
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227
        2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228
        2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
        (then it throws exceptions......)
        {noformat}

        The log is kind of problematic because it always show "Started MiniZK Cluster and connect 1 ZK server" but actually there're three zk servers.

        Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, "localhost", and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name.

        MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut.
        chendihao made changes -
        Assignee chendihao [ tobe ]
        chendihao made changes -
        Attachment HBASE-10283-0.94-v1.patch [ 12622621 ]

          People

          • Assignee:
            chendihao
            Reporter:
            chendihao
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development