Uploaded image for project: 'CloudStack'
  1. CloudStack
  2. CLOUDSTACK-8643

Helper for KVM High Availability

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • Future
    • KVM, Management Server
    • Security Level: Public (Anyone can view this level - this is the default.)
    • KVM hypervisors

    Description

      When running KVM with NFS storage all Agents will write a heartbeat to the NFS.

      Should a Agent go down, it will still be writing heartbeats even if libvirt has died.

      Using these heartbeats the Management Server can ask other KVM Agents if the other server is still beating. If not, it can fence it.

      While this works I've also encountered scenarios where you run without NFS and still want investigators.

      My proposal would be a Agent Helper running NEXT to the Agent it self.

      A simple Python daemon running a Basic HTTP server which queries libvirt every X seconds about:

      • Running Instances
      • Storage pools

      If keeps this in memory, so that even when libvirt goes down it knows what the last state was.

      Using the Qemu Monitor sockets we can actually see if the guests we have in memory are still online.

      If they are we simply keep the list.

      Now, if a investigator comes by and wants to know if the host is still up it can ALSO ask the helper.

      The management server can ask the helper, but the other agents could as well.

      This doesn't work in all cases, eg where storage is lost. But a additional helper would be useful to catch scenarios where the Agent itself became unresponsive.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              widodh Wido den Hollander
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: