Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
I have seen on occasion VM new or reload requests fail because the code detects that VMware failed to power the VM on. The first warning that appears in vcld.log looks like this:
VIM_SSH.pm:_run_vim_cmd|293| ---- WARNING ----
VIM_SSH.pm:_run_vim_cmd|293| 2017-08-05 12:02:29VIM_SSH.pm:_run_vim_cmd|293|attempt 1/5: command failed on VM host <VMware host>: vim-cmd vmsvc/power.on 369, exit status: 1, output:
VIM_SSH.pm:_run_vim_cmd|293| Powering on VM:
VIM_SSH.pm:_run_vim_cmd|293| Power on failed
VIM_SSH.pm:_run_vim_cmd|293| ( 0) VIM_SSH.pm, _run_vim_cmd (line: 293)
VIM_SSH.pm:_run_vim_cmd|293| (-1) VIM_SSH.pm, vm_power_on (line: 1224)
VIM_SSH.pm:_run_vim_cmd|293| (-2) VMware.pm, power_on (line: 6861)
VIM_SSH.pm:_run_vim_cmd|293| (-3) VMware.pm, load (line: 751)
VIM_SSH.pm:_run_vim_cmd|293| (-4) new.pm, reload_image (line: 588)
VIM_SSH.pm:_run_vim_cmd|293| (-5) new.pm, process (line: 275)
VIM_SSH.pm:_run_vim_cmd|293| (-6) vcld, make_new_child (line: 601)
VIM_SSH.pm:_run_vim_cmd|293| (-7) vcld, main (line: 353)
The Power on failed message may be true in some cases. However, I've seen this returned when the VM, in fact, did power on when the first attempt is made.
The output of the vim-cmd vmsvc/power.on command does not indicate that the VM is already powered on.
blade1a2-7:/vmfs/volumes/5900b99c-beb1caa4-1e81-3440b5dc1a7c/vs1-21_4513-v3 $ vim-cmd vmsvc/power.on 369
Powering on VM:
Power on failed
blade1a2-7:/vmfs/volumes/5900b99c-beb1caa4-1e81-3440b5dc1a7c/vs1-21_4513-v3 $ vim-cmd vmsvc/power.getstate 369
Retrieved runtime info
Powered on
Regardless, the VIM_SSH.pm::_run_vim_cmd subroutine then makes 4 more attempts. All subsequent attempts fail because the VM is already powered on and the reload eventually fails:
2017-08-05 12:02:29VIM_SSH.pm:_run_vim_cmd|203|sleeping 5 seconds before making attempt 2/5
... (4 additional power on attempts are made)
VIM_SSH.pm:_run_vim_cmd|309| ---- WARNING ----
VIM_SSH.pm:_run_vim_cmd|309| 2017-08-05 12:02:52VIM_SSH.pm:_run_vim_cmd|309|failed to run VIM command on VM host <VMware host>: 'vim-cmd vmsvc/power.on 369', made 5 attempts
...
VMware.pm:load|752| ---- WARNING ----
VMware.pm:load|752| 2017-08-05 12:02:52VMware.pm:load|752|failed to power on VM <VM> on VM host: <VMware host>
...
new.pm:reload_image|596| ---- WARNING ----
new.pm:reload_image|596| 2017-08-05 12:02:52new.pm:reload_image|596|<image> failed to load on <VM>, returning
...
new.pm:process|325| ---- WARNING ----
new.pm:process|325| 2017-08-05 12:02:52new.pm:process|325|failed to load <VM> with <image>
...
State.pm:reservation_failed|461| ---- CRITICAL ----
State.pm:reservation_failed|461| 2017-08-05 12:02:52State.pm:reservation_failed|461|reservation failed on <VM>: process failed after trying to load or make available
At a minimum, the VMware.pm::power_on subroutine should check the power state of the VM after its call to the API module's vm_power_on subroutine fails. This would prevent the reservation to fail.
A better solution would be for a single attempt to be made followed by a power status check if it indicates the power on operation failed. If the VM is indeed off, make additional power on attempts.