[CLOUDSTACK-8943] KVM HA is broken, let's fix it - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Security Level: Public (Anyone can view this level - this is the default.)
Labels:
None
Environment:
Linux distros with KVM/libvirt

Description

Currently KVM HA works by monitoring an NFS based heartbeat file and it can often fail whenever this network share becomes slower, causing the hypervisors to reboot.

This can be particularly annoying when you have different kinds of primary storages in place which are working fine (people running CEPH etc).

Having to wait for the affected HV which triggered this to come back and declare it's not running VMs is a bad idea; this HV could require hours or days of maintenance!

This is embarrassing. How can we fix it? Ideas, suggestions? How are other hypervisors doing it?

Let's discuss, test, implement.

Attachments

Issue Links

relates to

CLOUDSTACK-8643 Helper for KVM High Availability

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Nux

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 09/Oct/15 23:14

Updated:: 09/Oct/17 08:46