Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12011

Consistence problem when in-sync replicas are DOWN

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.3, 8.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Currently, we will meet consistency problem when in-sync replicas are DOWN. For example:
      1. A collection with 1 shard with 1 leader and 2 replicas
      2. Nodes contain 2 replicas go down
      3. The leader receives an update A, success
      4. The node contains the leader goes down
      5. 2 replicas come back
      6. One of them become leader --> But they shouldn't become leader since they missed the update A

      A solution to this issue :

      • The idea here is using term value of each replica (SOLR-11702) will be enough to tell that a replica received the latest updates or not. Therefore only replicas with the highest term can become the leader.
      • There are a couple of things need to be done on this issue
        • When leader receives the first updates, its term should be changed from 0 -> 1, so further replicas added to the same shard won't be able to become leader (their term = 0) until they finish recovery
        • For DOWN replicas, the leader should also need to check (in DUP.finish()) that those replicas have term less than leader before return results to users
        • Just by looking at term value of replica, it is not enough to tell us that replica is in-sync with leader or not. Because that replica might not finish the recovery process. We need to introduce another flag (stored on shard term node on ZK) to tell us that replica finished recovery or not. It will look like this.
          • {"code_node1" : 1, "core_node2" : 0}

            — (when core_node2 start recovery) --->

          • {"core_node1" : 1, "core_node2" : 1, "core_node2_recovering" : 1}

            — (when core_node2 finish recovery) --->

          • {"core_node1" : 1, "core_node2" : 1}

        Attachments

        1. SOLR-12011.patch
          55 kB
          Cao Manh Dat
        2. SOLR-12011.patch
          57 kB
          Cao Manh Dat
        3. SOLR-12011.patch
          54 kB
          Cao Manh Dat
        4. SOLR-12011.patch
          52 kB
          Cao Manh Dat
        5. SOLR-12011.patch
          52 kB
          Cao Manh Dat

          Issue Links

            Activity

              People

              • Assignee:
                caomanhdat Cao Manh Dat
                Reporter:
                caomanhdat Cao Manh Dat
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: