Details
Description
Introduction
In some production environments with multiple clusters it was noticed that unused templates were consuming too much storage. It was discovered that template cleanup was not deleting marked templates on ESXi.
Description of the problem
Suppose we have multiple clusters (c1, c2,...,cN) on a data center and template T from which we deploy vms on c1.
Suppose now that we expunge those vms, and there's no other vm instance from template T, so this was the actual workflow:
- CloudStack marks template for cleanup after storage.cleanup.interval seconds, by setting marked_for_gc = 1 on template_spool_ref table, for that template.
- After another storage.cleanup.interval seconds a DestroyCommand will be sent, to delete template from primary storage
- On VmwareResource, command is processed, and it first picks up a random cluster, say ci != c1 to look for vm template (using volume's path) and destroy it. But, as template was on c1 it cannot be found, so it won't be deleted. Entry on template_spool_ref is deleted but not the actual template on hypervisor side.
Proposed solution
We propose a way to attack problem shown in point 3, by not picking up a random cluster to look for vm but using vSphere data center. This way we make sure vm template will be deleted in every case, and not depending on random cluster selection
Attachments
Issue Links
- links to