Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6923

AutoAddReplicas should consult live nodes also to see if a state has changed

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.0, 6.0
    • SolrCloud
    • None

    Description

      • I did the following
        ./solr start -e cloud -noprompt
        
        kill -9 <pid-of-node2> //Not the node which is running ZK
        
      • /live_nodes reflects that the node is gone.
      • This is the only message which gets logged on the node1 server after killing node2
      45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  org.apache.zookeeper.server.NIOServerCnxn  – caught end of stream exception
      EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001, likely client has closed socket
          at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
          at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
          at java.lang.Thread.run(Thread.java:745)
      
      • The graph shows the node2 as 'Gone' state
      • clusterstate.json keeps showing the replica as 'active'
      {"collection1":{
          "shards":{"shard1":{
              "range":"80000000-7fffffff",
              "state":"active",
              "replicas":{
                "core_node1":{
                  "state":"active",
                  "core":"collection1",
                  "node_name":"169.254.113.194:8983_solr",
                  "base_url":"http://169.254.113.194:8983/solr",
                  "leader":"true"},
                "core_node2":{
                  "state":"active",
                  "core":"collection1",
                  "node_name":"169.254.113.194:8984_solr",
                  "base_url":"http://169.254.113.194:8984/solr"}}}},
          "maxShardsPerNode":"1",
          "router":{"name":"compositeId"},
          "replicationFactor":"1",
          "autoAddReplicas":"false",
          "autoCreated":"true"}}
      

      One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json never changes. There might be more features which are affected by this.

      On first thought I think we can handle this - The shard leader could listen to changes on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            markrmiller@gmail.com Mark Miller
            varun Varun Thacker
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment