Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6923

AutoAddReplicas should consult live nodes also to see if a state has changed

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      • I did the following
        ./solr start -e cloud -noprompt
        
        kill -9 <pid-of-node2> //Not the node which is running ZK
        
      • /live_nodes reflects that the node is gone.
      • This is the only message which gets logged on the node1 server after killing node2
      45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
      EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001, likely client has closed socket
          at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
          at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
          at java.lang.Thread.run(Thread.java:745)
      
      • The graph shows the node2 as 'Gone' state
      • clusterstate.json keeps showing the replica as 'active'
      {"collection1":{
          "shards":{"shard1":{
              "range":"80000000-7fffffff",
              "state":"active",
              "replicas":{
                "core_node1":{
                  "state":"active",
                  "core":"collection1",
                  "node_name":"169.254.113.194:8983_solr",
                  "base_url":"http://169.254.113.194:8983/solr",
                  "leader":"true"},
                "core_node2":{
                  "state":"active",
                  "core":"collection1",
                  "node_name":"169.254.113.194:8984_solr",
                  "base_url":"http://169.254.113.194:8984/solr"}}}},
          "maxShardsPerNode":"1",
          "router":{"name":"compositeId"},
          "replicationFactor":"1",
          "autoAddReplicas":"false",
          "autoCreated":"true"}}
      

      One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json never changes. There might be more features which are affected by this.

      On first thought I think we can handle this - The shard leader could listen to changes on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json?

        Attachments

          Activity

            People

            • Assignee:
              markrmiller@gmail.com Mark Miller
              Reporter:
              varunthacker Varun Thacker
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: