[CLOUDSTACK-7184] HA should wait for at least 'xen.heartbeat.interval' sec before starting HA on vm's when host is marked down - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 4.3.0, 4.4.0, 4.5.0
Fix Version/s: None
Component/s: Hypervisor Controller, Management Server, XenServer
Security Level: Public (Anyone can view this level - this is the default.)
Labels:
None
Environment:
CloudStack 4.3 with XenServer 6.2 hypervisors

Description

Hypervisor got isolated for 30 seconds due to a network issue. CloudStack did discover this and marked the host as down, and immediately started HA. Just 18 seconds later the hypervisor returned and we ended up with 5 vm's that were running on two hypervisors at the same time.

This, of course, resulted in file system corruption and the loss of the vm's. One side of the story is why XenServer allowed this to happen (will not bother you with this one). The CloudStack side of the story: HA should only start after at least xen.heartbeat.interval seconds. If the host is down long enough, the Xen heartbeat script will fence the hypervisor and prevent corruption. If it is not down long enough, nothing should happen.

Logs (short):
2014-07-25 05:03:28,596 WARN [c.c.a.m.DirectAgentAttache] (DirectAgent-122:ctx-690badc5) Unable to get current status on 505(mccpvmXX)
.....
2014-07-25 05:03:31,920 ERROR [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-11b9af3e) Host is down: 505-mccpvmXX. Starting HA on the VMs
.....
2014-07-25 05:03:49,655 DEBUG [c.c.h.Status] (ClusteredAgentManager Timer:ctx-0e00979c) Transition:[Resource state = Enabled, Agent event = AgentDisconnected, Host id = 505, name = mccpvmXX]

cs marks host down: 2014-07-25 05:03:31,920
cs marks host up: 2014-07-25 05:03:49,655

Attachments

Activity

People

Assignee:: Daan

Reporter:: Remi Bergsma

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 25/Jul/14 15:27

Updated:: 19/Dec/14 13:50

Resolved:: 18/Sep/14 09:26