Uploaded image for project: 'VCL'
  1. VCL
  2. VCL-1023

Cluster reservations may fail to copy an image if assigned to multiple VM hosts sharing a datastore



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.2
    • 2.5
    • vcld (backend)
    • None



      • Cluster request
      • Multiple reservations are assigned the same image revision
      • Reservations are assigned to VMs on different VMware ESXi hosts
      • VMware ESXi hosts share a common virtual disk image datastore
      • Image does not yet exist on the datastore and needs to be copied from the repository

      Each vcld process checks if the image needs to be copied from the repository to the datastore. Since the same image revision was assigned to multiple reservations in the cluster request, multiple vcld processes determine the image needs to be copied.

      The code does obtain a semaphore before attempting to copy the image. However, the semaphore name is based on both the VM host name and image name:

      2017-03-14 00:25:46|18904|3115170|3222911|new|Module.pm:get_semaphore|1601|created 'blade1a1-13-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3' Semaphore object, memory address: 557fdf0
      2017-03-14 00:25:46|18908|3115170|3222912|new|Module.pm:get_semaphore|1601|created 'blade1a1-8-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3' Semaphore object, memory address: 5023f10
      2017-03-14 00:25:47|18913|3115170|3222914|new|Module.pm:get_semaphore|1601|created 'blade1a1-9-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3' Semaphore object, memory address: 5024518
      2017-03-14 00:25:47|18926|3115170|3222918|new|Module.pm:get_semaphore|1601|created 'blade1a1-3-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3' Semaphore object, memory address: 50256d0
      2017-03-14 00:26:12|18930|3115170|3222919|new|Module.pm:get_semaphore|1601|created 'blade1a1-11-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3' Semaphore object, memory address: 5021988
      2017-03-14 00:31:18|18917|3115170|3222916|new|Module.pm:get_semaphore|1601|created 'blade1a1-13-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3' Semaphore object, memory address: 5578c60
      2017-03-14 00:31:24|18922|3115170|3222917|new|Module.pm:get_semaphore|1601|created 'blade1a1-3-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3' Semaphore object, memory address: 4493e78

      The first 5 processes each obtained a semaphore within 30 seconds of each other. Afterwards, each attempted to copy the .vmdk to the same shared directory.

      The last 2 processes obeyed the semaphore and waited several minutes because the VM host name was the same as that of another reservation. Once the process assigned to the same VM host finished attempting to copy the .vmdk and released the semaphore, the last 2 processes checked if the copy was still necessary. This is how it is supposed to work for all processes copying to the same destination.

      The code should be updated to use a better name for the semaphore. The datastore UUID should be used along with the image revision name.




            arkurth Andrew Kurth
            arkurth Andrew Kurth
            0 Vote for this issue
            1 Start watching this issue