Solr
  1. Solr
  2. SOLR-5359

CloudSolrServer tries to connect to zookeeper forever when ensemble is unavailable

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.5
    • Fix Version/s: 4.6, Trunk
    • Component/s: clients - java
    • Labels:
      None

      Description

      When opening a new CloudSolrServer against an unavailable zookeeper ensemble, the following exception messages are logged:

      INFO [hybrisHTTP28-SendThread(localhost:2181)] [ClientCnxn] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
      WARN [hybrisHTTP28-SendThread(localhost:2181)] [ClientCnxn] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
      java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
      at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
      INFO [hybrisHTTP28-SendThread(localhost:2181)] [ClientCnxn] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
      WARN [hybrisHTTP28-SendThread(localhost:2181)] [ClientCnxn] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
      java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
      at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

      This is consistent with the behaviour of zkCli.sh - however, it does never timeout. zkCli.sh stops connecting after 30 seconds, but the zookeeper connection attempts by CloudSolrServer show the above messages forever, regardless of ZkClientTimeout and ZkConnectTimeout.

      Calls to e.g. isAlive() do indeed time out, but that does not stop the underlying CloudSolrServer instance from connecting.

      It does not seem to be possible to set a different zkHost for an existing CloudSolrServer instance either, so once an instance is created with a bad/wrong zkHost string it seems impossible to destroy.
      Even if the zkHost were correct and just the ensemble down one has to keep around the CloudSolrService and not dismiss it after a failed connection attempt - otherwise each try will generate a new ZkClient that then attempts to conncet forever, leading to more and more client attempts, as the clients never stop and are never garbage collected.

      I believe the CloudSolrService/ZkClient should stop trying to connect altogether after a timeout and be garbage collected.

        Activity

        Hide
        Klaus Herrmann added a comment -

        I also tried
        cloudSolrServer.shutdown() and cloudSolrServer.getZkStateReader().close() - but no luck either.
        Might I be missing something else?

        Show
        Klaus Herrmann added a comment - I also tried cloudSolrServer.shutdown() and cloudSolrServer.getZkStateReader().close() - but no luck either. Might I be missing something else?
        Hide
        Mark Miller added a comment -

        Patch with test and fix.

        Show
        Mark Miller added a comment - Patch with test and fix.
        Hide
        ASF subversion and git services added a comment -

        Commit 1533786 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1533786 ]

        SOLR-5359: ZooKeeper client is not closed when it fails to connect to an ensemble.

        Show
        ASF subversion and git services added a comment - Commit 1533786 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1533786 ] SOLR-5359 : ZooKeeper client is not closed when it fails to connect to an ensemble.
        Hide
        ASF subversion and git services added a comment -

        Commit 1533788 from Mark Miller in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1533788 ]

        SOLR-5359: ZooKeeper client is not closed when it fails to connect to an ensemble.

        Show
        ASF subversion and git services added a comment - Commit 1533788 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1533788 ] SOLR-5359 : ZooKeeper client is not closed when it fails to connect to an ensemble.
        Hide
        Mark Miller added a comment -

        Thanks Klaus!

        Show
        Mark Miller added a comment - Thanks Klaus!

          People

          • Assignee:
            Mark Miller
            Reporter:
            Klaus Herrmann
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development