Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10684

Extend HA support for more use cases

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • ha
    • None

    Description

      We'd like the current HA framework to be more configurable from a behavior standpoint. In particular:

      • Add the ability for a HAServiceTarget to survive a configurable number of health check failures (default of 0) before HealthMonitor (HM) reports service not responding or service unhealthy. For instance, predicate the HM on a state machine whose default implementation can be overridden by method or constructor argument. The default would behave the same as today.
        • If a target fails a health check but does not exceed the maximum number of consecutive check failures, it’d be desirable if the target and/or controller were alerted.
          • i.e. Introduce a SERVICE_DYING state
            --Additionally, it’d be desirable if a mechanism existed, similar to fencing semantics, for “reviving” a service that transitioned to SERVICE_DYING.
          • i.e. attemptRevive(…)
      • Add the ability to allow a service to completely fail (no failover or failback possible). There are scenarios where allowing a failover or failback could cause more damage.
        • E.g. a recovered master with stale data. The master may have been manually recovered (human error).
      • Add affinity to a particular HAServiceTarget.
        • In other words, allow the controller to prefer one target over another when deciding leadership.
        • If a higher affinity, but previously unhealthy target, becomes healthy then it should be allowed to become the leader.
        • Likewise, if two targets are racing for a ZooKeeper lock, then the controller should "prefer" the higher the affinity target.
        • It might make more sense to add a different implementation/subclass of the ZKFailoverController (i.e. ZKAffinityFailoverController) than modify current behavior.

      Please comment with thoughts/ideas/etc...
      Thanks.

      Attachments

        Activity

          People

            Unassigned Unassigned
            prubio Paul Rubio
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: