Description
On some executions where ZooKeeper server is not available, Curator client got waiting and hanging indefinitely, with thread dump stack trace which can be seen
below.
As this is not reproduced consistently, it seems like a race condition from Curator/ZooKeeper client, since zookeeper.request.timeout cannot be configured in Curator client.
As a work-around solution, initialization is executed in a separate thread in order to interrupt it if it hangs. This has been identified and handled here:
join_while_zookeeper_down_issue
The wanted solution is expose configuration to be able to configure zookeeper.request.timeout, then it should wait until the request timeout, which is treated at org.apache.zookeeper.ClientCnxn.submitRequest().
stacktrace size: 31
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:502)
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1561)
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1533)
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1834)
org.apache.curator.framework.imps.CreateBuilderImpl$16.call(CreateBuilderImpl.java:1131)
org.apache.curator.framework.imps.CreateBuilderImpl$16.call(CreateBuilderImpl.java:1113)
org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1110)
org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:593)
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:583)
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:48)
org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.internalRegisterService(ServiceDiscoveryImpl.java:237)
org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.reRegisterServices(ServiceDiscoveryImpl.java:456)
org.apache.curator.x.discovery.details.ServiceDiscoveryImpl.start(ServiceDiscoveryImpl.java:135)
...
Attachments
Issue Links
- causes
-
HADOOP-18870 CURATOR-599 change broke functionality introduced in HADOOP-18139 and HADOOP-18709
- Resolved
- supercedes
-
CURATOR-309 ConnectionState:244 - Authentication failed
- Closed