Status: In Progress
Security Level: Public (Anyone can view this level - this is the default.)
This issue was discovered, fixed and tested on KVM, but applies for every hypervisor.
When enabling maintenance mode in a host, host state is put into 'PrepareForMaintenance' and running VMs are migrated into another host. After every VM is migrated, host goes to 'Maintenance' state.
Checks are performed on ResourceManagerImpl.checkAndMaintan() method:
- List VMs with host_id = HOST_ID
- List VMs with last_host_id = HOST_ID and state=Migrating
When both queries are empty, then the host can be put into Maintenance.
When a VM is being migrated to DEST_HOST, its host_id column is set to DEST_HOST, last_host_id = ORIGIN_HOST and state = Migrating. If then migration fails, host_id = last_host_id = ORIGIN_HOST
- Enable maintenance mode on ORIGIN_HOST
- VMs start being migrated to a host, say DEST_HOST
- checkAndMaintain() starts:
- First check passes (no VM with host_id = ORIGIN_HOST_ID as those are being migrated)
- Before the second check, one or more migrations fail
- Second check passes, however there are VMs running on the host as migrations have failed.
- Host goes into Maintenance state.
Screenshots attached, query executed on each case:
select id, name, instance_name, state, host_id, last_host_id from vm_instance;