Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
3.5.9, 3.5.3, 3.6.3, 3.6.2
-
None
Description
I believe this bug was originally reported as ZOOKEEPER-2966 but that was closed as not reproducible in February 2019. I left a comment with these details on that issue in December. I can create a PR with a fix at some point this week.
In ZooKeeper 3.6.2, in the context of the SolrJ client, we hit the NPE reported on ZOOKEEPER-2966 when a DNS error causes an exception after the SolrZkClient trys to connect to ZooKeeper, but then immediately calls close on the ClientCnxn https://github.com/apache/solr/blob/releases/lucene-solr%2F8.7.0/solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L158-L204.
java.lang.NullPointerException: null at org.apache.zookeeper.ClientCnxnSocketNetty.onClosing(ClientCnxnSocketNetty.java:247) ~[zookeeper-3.6.2.jar:3.6.2] at org.apache.zookeeper.ClientCnxn$SendThread.close(ClientCnxn.java:1445) ~[zookeeper-3.6.2.jar:3.6.2] at org.apache.zookeeper.ClientCnxn.disconnect(ClientCnxn.java:1488) ~[zookeeper-3.6.2.jar:3.6.2] at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1517) ~[zookeeper-3.6.2.jar:3.6.2] at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:1614) ~[zookeeper-3.6.2.jar:3.6.2] at org.apache.solr.common.cloud.SolrZooKeeper.close(SolrZooKeeper.java:97) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:198) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:127) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:122) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:109) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18]
This happens if the ClientCnxnSocketNetty's onClosing() is called before connect(...) (or if connect isn't called at all) because the firstConnect CountDownLatch is only initialized in connect(...).
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java#L129
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java#L247
A null check in onClosing() will fix it, but I don't know if there's any greater change required, e.g. some synchronization around connect and onClosing.
The code in 3.5.3 looks very similar, it looks like it's been present since the initial commit of ClientCnxnSocketNetty.
Attachments
Issue Links
- duplicates
-
ZOOKEEPER-2966 Flaky NullPointerException while closing client connection
- Resolved
- is duplicated by
-
ZOOKEEPER-4525 Thread leaks occur when resolve address failed.
- Resolved
- links to