Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
Description
I've seen this on occasion across different hardwares; the standalone zookeeper won't come up and then random unit test fails in its junit startup phase as part of launching mini cluster.
I've been trying to track this w/ a while now adding in logging, using more of zk client instead of the copy/paste we've had a long while now, and adding in logging (I've been running locally w/ zk logging set to INFO).
It looks like this currently where the last thing out of the server launch is this....
2020-03-02 07:57:46,129 INFO [Time-limited test] server.ZooKeeperServer(854): maxSessionTimeout set to -1 2020-03-02 07:57:46,139 INFO [Time-limited test] server.NIOServerCnxnFactory(89): binding to port 0.0.0.0/0.0.0.0:49316 2020-03-02 07:57:46,181 INFO [Time-limited test] zookeeper.MiniZooKeeperCluster(256): Started connectionTimeout=30000, dir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/ cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0, clientPort=49316, dataDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/ zookeeper_0/version-2, dataLogDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0/version-2, tickTime=2000, maxClientCnxns=300, minSessionTimeout=4000, maxSessionTimeout=40000, serverId=0
... then the client just does this over and over:
2020-03-02 07:57:46,182 INFO [Time-limited test] client.FourLetterWordMain(65): connecting to localhost 49316 2020-03-02 07:57:46,213 INFO [Time-limited test] zookeeper.MiniZooKeeperCluster(453): localhost:49316 not up java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:209) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:84) at org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.waitForServerUp(MiniZooKeeperCluster.java:442) at org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:259) at org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:130) at org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:103) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1107) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1065) at org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:745)
There is ZOOKEEPER-2714 but it is closed as not-reproducible and it looks a little different going by the log emissions in that it registers the client connections.
Noting this here in case others have seen similar or there are ideas out there for how to fix.
Here for example is how the failure might look high-level:
------------------------------------------------------------------------------- Test set: org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable ------------------------------------------------------------------------------- Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.017 s <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable Time elapsed: 0.01 s <<< ERROR! java.io.IOException: Waiting for startup of standalone server; server isRunning=true at org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62)
... and then when you dig in, the test never even made it past cluster launch in setup.
Attachments
Issue Links
- is related to
-
HBASE-23993 Use loopback for zk standalone server in minizkcluster
- Resolved