Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-23919

[Flakey Test] Standalone Zookeeper won't start (minizookeepercluster won't come up)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • flakies
    • None

    Description

      I've seen this on occasion across different hardwares; the standalone zookeeper won't come up and then random unit test fails in its junit startup phase as part of launching mini cluster.

      I've been trying to track this w/ a while now adding in logging, using more of zk client instead of the copy/paste we've had a long while now, and adding in logging (I've been running locally w/ zk logging set to INFO).

      It looks like this currently where the last thing out of the server launch is this....

      
        2020-03-02 07:57:46,129 INFO  [Time-limited test] server.ZooKeeperServer(854): maxSessionTimeout set to -1                                                                                                                            2020-03-02 07:57:46,139 INFO  [Time-limited test] server.NIOServerCnxnFactory(89): binding to port 0.0.0.0/0.0.0.0:49316                                                                                                              2020-03-02 07:57:46,181 INFO  [Time-limited test] zookeeper.MiniZooKeeperCluster(256): Started connectionTimeout=30000, dir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/      cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0, clientPort=49316, dataDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/ zookeeper_0/version-2, dataLogDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0/version-2, tickTime=2000,              maxClientCnxns=300, minSessionTimeout=4000, maxSessionTimeout=40000, serverId=0

      ... then the client just does this over and over:

       2020-03-02 07:57:46,182 INFO  [Time-limited test] client.FourLetterWordMain(65): connecting to localhost 49316                                                                                                                        2020-03-02 07:57:46,213 INFO  [Time-limited test] zookeeper.MiniZooKeeperCluster(453): localhost:49316 not up                                                                                                                         java.net.SocketException: Connection reset                                                                                                                                                                                              at java.net.SocketInputStream.read(SocketInputStream.java:209)                                                                                                                                                                        at java.net.SocketInputStream.read(SocketInputStream.java:141)                                                                                                                                                                        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)                                                                                                                                                                         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)                                                                                                                                                                          at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)                                                                                                                                                                              at java.io.InputStreamReader.read(InputStreamReader.java:184)                                                                                                                                                                         at java.io.BufferedReader.fill(BufferedReader.java:161)                                                                                                                                                                               at java.io.BufferedReader.readLine(BufferedReader.java:324)                                                                                                                                                                           at java.io.BufferedReader.readLine(BufferedReader.java:389)                                                                                                                                                                           at org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:84)                                                                                                                                         at org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.waitForServerUp(MiniZooKeeperCluster.java:442)                                                                                                                              at org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:259)                                                                                                                                      at org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:130)                                                                                                                                   at org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:103)                                                                                                                                   at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1107)                                                                                                                                        at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1065)                                                                                                                                        at org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62)                                                                                                           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                                                                                        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)                                                                                                                                                      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                                                                                              at java.lang.reflect.Method.invoke(Method.java:498)                                                                                                                                                                                   at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)                                                                                                                                               at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)                                                                                                                                                at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)                                                                                                                                                 at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)                                                                                                                                                  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)                                                                                                                                                      at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)                                                                                                                                                        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)                                                                                                                                 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)                                                                                                                                 at java.util.concurrent.FutureTask.run(FutureTask.java:266)                                                                                                                                                                           at java.lang.Thread.run(Thread.java:745)
       

       

      There is ZOOKEEPER-2714 but it is closed as not-reproducible and it looks a little different going by the log emissions in that it registers the client connections.

      Noting this here in case others have seen similar or there are ideas out there for how to fix.

      Here for example is how the failure might look high-level:

      
       -------------------------------------------------------------------------------
       Test set: org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable
       -------------------------------------------------------------------------------
       Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.017 s <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable
       org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable  Time elapsed: 0.01 s  <<< ERROR!
       java.io.IOException: Waiting for startup of standalone server; server isRunning=true
         at 
      org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62) 

      ... and then when you dig in, the test never even made it past cluster launch in setup. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: