Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-9822

Split-brain Certain During Network Partition in Two-Locator Cluster

    XMLWordPrintableJSON

Details

    Description

      In a two-locator cluster with default member weights and default setting (true) of enable-network-partition-detection, if a long-lived network partition separates the two members, a split-brain will arise: there will be two coordinators at the same time.

      The reason for this can be found in the GMSJoinLeave.isNetworkPartition() method. That method's name is misleading. A name like isMajorityLost() would probably be more apt. It needs to return true iff the weight of "crashed" members (in the prospective view) is greater-than-or-equal-to half (50%) of the total weight (of all members in the current view).

      What the method actually does is return true iff the weight of "crashed" members is greater-than 51% of the total weight. As a result, if we have two members of equal weight, and the coordinator sees that the non-coordinator is "crashed", the coordinator will keep running. If a network partition is happening, and the non-coordinator is still running, then it will become a coordinator and start producing views. Now we'll have two coordinators producing views concurrently.

      For this discussion "crashed" members are members for which the coordinator has received a RemoveMemberRequest message. These are members that the failure detector has deemed failed. Keep in mind the failure detector is imperfect (it's not always right), and that's kind of the whole point of this ticket: we've lost contact with the non-coordinator member, but that doesn't mean it can't still be running (on the other side of a partition).

      This bug is not limited to the two-locator scenario. Any set of members that can be partitioned into two equal sets is susceptible. In fact it's even a little worse than that. Any set of members that can be partitioned (into more than one set), where any two-or-more sets, each still have 49% or more of the total weight, will result in a split-brain

      Attachments

        Issue Links

          Activity

            People

              burcham Bill Burcham
              burcham Bill Burcham
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: