Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10889

Stale zookeper information is used during failover check

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 7.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In OverseerAutoReplicaFailoverThread it goes over each and every replica to check if it needs to be reloaded on a new node. In each such round it reads cluster state just in the beginning. Especially in case of big clusters, cluster state may change during the process of iterating through the replicas. As a result false decisions may be made: restarting a healthy core, or not handling a bad node.

      The code fragment in question:

              for (Slice slice : slices) {
                if (slice.getState() == Slice.State.ACTIVE) {
                  final Collection<DownReplica> downReplicas = new ArrayList<DownReplica>();
                  int goodReplicas = findDownReplicasInSlice(clusterState, docCollection, slice, downReplicas);
      

      The solution seems rather straightforward, reading the state every time:

                  int goodReplicas = findDownReplicasInSlice(zkStateReader.getClusterState(), docCollection, slice, downReplicas);
      

      The only counter argument that comes into my mind is too frequent reading of the cluster state. We can enhance this naive solution so that re-reading is done only if a bad node is found. But I am not sure if such a read optimization is necessary.

      I have done some unit tests around this class, mocking out even the time factor. It runs in a second. I am interested in getting feedback about such an approach. I will upload a patch with this shortly.

        Attachments

        1. SOLR-10889.patch
          25 kB
          Mihaly Toth

          Activity

            People

            • Assignee:
              markrmiller@gmail.com Mark Miller
              Reporter:
              mihaly.toth Mihaly Toth
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: