[QPID-33] Introduce clustering for high availability & fault tolerance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.18
Component/s: Broker-J
Labels:
None

Description

This task has been created as an initial place holder from which it is anticipated many tasks will derive.

We currently have a clustering implementation which provides scalability but not high availability i.e. currently if a broker in a cluster fails its clients can failover to another broker in the same cluster BUT we do not have the ability to restart on another node at the last state before failure using the saved state (from shared storage).

The other brokers in a cluster will know about (via broadcasting) each other's queues etc, but not about any action the failed broker will processing - thus we could potentially suffer message loss and state disconnect. Also note that currently membership of a cluster does not imply any failover behaviour automatically.

We know that there are users who require HA/fault tolerant clustering with 99.999% availability.

A holding page for clustering & HA notes exists here: http://cwiki.apache.org/confluence/display/qpid/ClusteringHA with use case content.

The analysis for this task will involve expanding the design documentation and inviting review prior to work starting on the implementation and also requires a thorough understanding of the protocol.

Attachments

Activity

People

Assignee:: Keith Wall

Reporter:: Marnie McCormack

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Oct/06 13:28

Updated:: 11/Feb/15 20:07

Resolved:: 14/Sep/12 15:15