Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8599

Errors in construction of SolrZooKeeper cause Solr to go into an inconsistent state

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5.1, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      We originally saw this happen due to a DNS exception (see stack trace below). Although any exception thrown in the constructor of SolrZooKeeper or the parent class, ZooKeeper, will cause DefaultConnectionStrategy to fail to update the zookeeper client. Once it gets into this state, it will not try to connect again until the process is restarted. The node itself will also respond successfully to query requests, but not to update requests.

      Two things should be address here:
      1) Fix the error handling and issue some number of retries
      2) If we are stuck in a state like this stop responding to all requests

      2016-01-23 13:49:20.222 ERROR ConnectionManager [main-EventThread] - :java.net.UnknownHostException: HOSTNAME: unknown error
      at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
      at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
      at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
      at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
      at java.net.InetAddress.getAllByName(InetAddress.java:1192)
      at java.net.InetAddress.getAllByName(InetAddress.java:1126)
      at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
      at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
      at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
      at org.apache.solr.common.cloud.SolrZooKeeper.<init>(SolrZooKeeper.java:41)
      at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
      at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
      at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
      at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
      2016-01-23 13:49:20.222 INFO ConnectionManager [main-EventThread] - Connected:false
      2016-01-23 13:49:20.222 INFO ClientCnxn [main-EventThread] - EventThread shut down
      

        Attachments

        1. SOLR-8599.patch
          7 kB
          Keith Laban
        2. SOLR-8599.patch
          5 kB
          Keith Laban
        3. SOLR-8599.patch
          8 kB
          Dennis Gove
        4. SOLR-8599.patch
          4 kB
          Keith Laban

          Activity

            People

            • Assignee:
              dpgove Dennis Gove
              Reporter:
              k317h Keith Laban
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: