Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-9680

Newly Started/Restarted Locators are Susceptible to Split-Brains

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • None
    • membership
    • None

    Description

      The issues described here are present in all versions of Geode (this is not new to 1.15.0)…

      Geode is built on the assumption that views progress linearly in a sequence. If that sequence ever forks into two or more parallel lines then we have a "split brain".

      In a split brain condition, each of the parallel views are independent. It's as if you have more than one system running concurrently. It's possible e.g. for some clients to connect to members of one view and other clients to connect to members of another view. Updates to members in one view are not seen by members of a parallel view.

      Geode views are produced by a coordinator. As long as only a single coordinator is running, there is no possibility of a split brain. Split brain arises when more than one coordinator is producing views at the same time.

      Each Geode member (peer) is started with the locators configuration parameter. That parameter specifies locator(s) to use to find the (already running!) coordinator (member) to join with.

      When a locator (member) starts, it goes through this sequence to find the coordinator:

      1. it first tries to find the coordinator through one of the (other) configured locators
      2. if it can't contact any of those, it tries contacting non-locator (cache server) members it has retrieved from the "view presistence" (.dat) file

      If it hasn't found a coordinator to join with, then the locator may become a coordinator.

      Sometimes this is ok. If no other coordinator is currently running then this behavior is fine. An example is when an administrator is starting up a brand new cluster. In that case we want the very first locator we start to become the coordinator.

      But there are a number of situations where there may already be another coordinator running but it cannot be reached:

      • if the administrator/operator wants to start up a brand new cluster with multiple locators and…
        • maybe Geode is running in a managed environment like Kubernetes and the locators hostnames are not (yet) resolvable in DNS
        • maybe there is a network partition between the starting locators so they can't communicate
        • maybe the existing locators or coordinator are running very slowly or the network is degraded. This is effectively the same as the network partition just mentioned
      • if a cluster is already running and the administrator/operator wants to scale it up by starting/adding a new locator Geode is susceptible to the same issues just mentioned
      • if a cluster is already running and the administrator/operator needs to restart a locator, e.g. for a rolling upgrade, if none of the locators in the locators configuration parameter are reachable (maybe they are not running, or maybe there is a network partition) and…
        • if the "view persistence" .dat file is missing or deleted
        • or if the current set of running Geode members has evolved so far that the coordinates (host+port) in the .dat file are completely out of date

      In each of those cases, the newly starting locator will become a coordinator and will start producing views. Now we'll have the old coordinator producing views at the same time as the new one.

      When This Ticket is Complete

      There are a number of possible solutions to these problems. Here is one possible solution…

      Geode will offer a locator startup mode (via TBD LocatorLauncher startup parameter) that prevents that locator from becoming a coordinator. In that mode, it will be possible for an administrator/operator to avoid many of the problematic scenarios mentioned above, while retaining the ability (via some other mode) to start a first locator which is allowed to become a coordinator.

      For purposes of discussion we'll call the startup mode that allows the locator to become a coordinator "seed" mode, and we'll call the new startup mode that prevents the locator from becoming a coordinator before first joining, "join-only" mode.

      After this mode split is implemented, it is envisioned that to start a brand new cluster, an administrator/operator will start the first locator in "seed" mode. After that the operator will start all subsequent locators in "join only" mode. If network partitions occur during startup, those newly started ("join-only") nodes will exit with a failure status—under no circumstances will they ever become coordinators.

      To add a locator to a running cluster, an operator starts it in "join only" mode. The new member will similarly either join with an existing coordinator or exit with a failure status, thereby avoiding split brains.

      When an operator restarts a locator, e.g. during a rolling upgrade, they will restart it in "join only" mode. If a network partition is encountered, or the .dat file is missing or stale, the new locator will exit with a failure status and split brain will be avoided.

      FAQ

      Q: What should happen if a locator is started in seed mode, but it can see another view member is already acting as coordinator?

      A: TBD. The question here is: shall a locator started in seed mode, first consult the locators configuration parameter and attempt to join with any coordinator it finds through them. On the one hand this could be viewed as a safety feature. On the other hand it increases complexity relative to the simpler behavior to wit, either become a coordinator or fail.

       

      Q: How long will join only wait before giving up and exiting? 

      A: TBD. Shall it "fail fast"/fail immediately or shall it retry?

       

      Q: Is "seed" mode exactly equal to current Geode functionality, and therefore does that mode require no new startup parameter?

      A: TBD. While it's true that seed mode sounds a bit like current behavior, we may wish to have a new, explicit seed mode that causes the starting member to either form a new cluster or fail, i.e. it never joins an existing cluster.

       

      Q: will members started with current Geode functionality, join-only mode, and seed mode (if it differs from current functionality) be interoperable?

      A: Yes. You'll be able to start a locator/coordinator with current functionality (or seed mode if that's distinct) and then start a new member join-only and have it join that cluster. Alternately you can start a new member with current Geode functionality and have it join (but be aware that new member might form its own cluster!)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              burcham Bill Burcham
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: