Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.8.0
-
None
-
None
Description
in common/base, NodeDriver._run_deployment_script has the following retry wrapper:
tries = 0
while tries < max_tries:
try:
node = task.run(node, ssh_client)
except Exception:
tries += 1
if tries >= max_tries:
raise LibcloudError(value='Failed after %d tries'
% (max_tries), driver=self)
else:
ssh_client.close()
return node
The except Exception swallows all errors, making debugging very hard.
Furthermore, max_tries is effectively hard-coded in deploy_node():
self._run_deployment_script(task=kwargs['deploy'],
node=node,
ssh_client=ssh_client,
max_tries=3)
... forcing people who want to control retries to spin their own deploy_node().
Suggestions:
- at a minimum, log or warn about the error that's caught in the retry loop
- better yet, make the catch more fine-grained, so that errors that we know won't be retry-able will fail out immediately.
- think about making the default number of max_tries 1
- make max_tries controllable from deploy_node