Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.2.0
-
None
Description
On the 1600 node cluster I attempted to do Stop-All and Start-All actions. During both actions, the request would abort itself - sometimes even when there were no failures or timeouts. Also, during this time hosts would go into heartbeat-lost state and the cluster would look like its broken. After like 5-10 minutes, the hosts slowly come back online and correct host-status is got. But due to request abort, some components would be stopped/started when they should not be.
Attachments
Attachments
Issue Links
- links to