Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12187

Replica should watch clusterstate and unload itself if its entry is removed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 7.4
    • None
    • None

    Description

      With the introduction of autoscaling framework, we have seen an increase in the number of issues related to the race condition between delete a replica and other stuff.

      Case 1: DeleteReplicaCmd failed to send UNLOAD request to a replica, therefore, forcefully remove its entry from clusterstate, but the replica still function normally and be able to become a leader -> SOLR-12176
      Case 2:

      • DeleteReplicaCmd enqueue a DELETECOREOP (without sending a request to replica because the node is not live)
      • The node start and the replica get loaded
      • DELETECOREOP has not processed hence the replica still present in clusterstate --> pass checkStateInZk
      • DELETECOREOP is executed, DeleteReplicaCmd finished
        • result 1: the replica start recovering, finish it and publish itself as ACTIVE --> state of the replica is ACTIVE
        • result 2: the replica throw an exception (probably: NPE)
          --> state of the replica is DOWN, not join leader election

      Attachments

        1. SOLR-12187.patch
          32 kB
          Cao Manh Dat
        2. SOLR-12187.patch
          31 kB
          Cao Manh Dat
        3. SOLR-12187.patch
          29 kB
          Cao Manh Dat
        4. SOLR-12187.patch
          27 kB
          Cao Manh Dat
        5. SOLR-12187.patch
          26 kB
          Cao Manh Dat
        6. SOLR-12187.patch
          11 kB
          Cao Manh Dat

        Activity

          People

            caomanhdat Cao Manh Dat
            caomanhdat Cao Manh Dat
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: