Affects Version/s: 3.5.9, 3.5.3, 3.6.3, 3.6.2
Fix Version/s: None
I believe this bug was originally reported as
ZOOKEEPER-2966 but that was closed as not reproducible in February 2019. I left a comment with these details on that issue in December. I can create a PR with a fix at some point this week.
In ZooKeeper 3.6.2, in the context of the SolrJ client, we hit the NPE reported on
ZOOKEEPER-2966 when a DNS error causes an exception after the SolrZkClient trys to connect to ZooKeeper, but then immediately calls close on the ClientCnxn https://github.com/apache/solr/blob/releases/lucene-solr%2F8.7.0/solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L158-L204.
This happens if the ClientCnxnSocketNetty's onClosing() is called before connect(...) (or if connect isn't called at all) because the firstConnect CountDownLatch is only initialized in connect(...).
A null check in onClosing() will fix it, but I don't know if there's any greater change required, e.g. some synchronization around connect and onClosing.
The code in 3.5.3 looks very similar, it looks like it's been present since the initial commit of ClientCnxnSocketNetty.