Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.1
-
None
Description
SSH processes issued from the management node to the computer being loaded occasionally hang for a very long time or indefinitely. This causes the reservation process to hang.
This problem usually occurs soon after the computer begins to respond to SSH after it has been reloaded. vcld detects that it is responding and begins to issue commands. The SSH service/daemon is probably still being initialized on the computer. The SSH command hangs and does not fail because it makes an initial connection, a hiccup occurs, and the SSH service on the computer runs normally. Setting SSH options such as ServerAlive* or TCPKeepAlive doesn't help because the computer responds to these messages.
Code should be added to timeout the SSH command process after a configurable amount of time.