[HADOOP-10684] Extend HA support for more use cases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: ha
Labels:
None

Description

We'd like the current HA framework to be more configurable from a behavior standpoint. In particular:

Add the ability for a HAServiceTarget to survive a configurable number of health check failures (default of 0) before HealthMonitor (HM) reports service not responding or service unhealthy. For instance, predicate the HM on a state machine whose default implementation can be overridden by method or constructor argument. The default would behave the same as today.
- If a target fails a health check but does not exceed the maximum number of consecutive check failures, it’d be desirable if the target and/or controller were alerted.
  - i.e. Introduce a SERVICE_DYING state
    --Additionally, it’d be desirable if a mechanism existed, similar to fencing semantics, for “reviving” a service that transitioned to SERVICE_DYING.
  - i.e. attemptRevive(…)
Add the ability to allow a service to completely fail (no failover or failback possible). There are scenarios where allowing a failover or failback could cause more damage.
- E.g. a recovered master with stale data. The master may have been manually recovered (human error).
Add affinity to a particular HAServiceTarget.
- In other words, allow the controller to prefer one target over another when deciding leadership.
- If a higher affinity, but previously unhealthy target, becomes healthy then it should be allowed to become the leader.
- Likewise, if two targets are racing for a ZooKeeper lock, then the controller should "prefer" the higher the affinity target.
- It might make more sense to add a different implementation/subclass of the ZKFailoverController (i.e. ZKAffinityFailoverController) than modify current behavior.

Please comment with thoughts/ideas/etc...
Thanks.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Paul Rubio

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 11/Jun/14 21:53

Updated:: 11/Jun/14 21:53