Details
Description
Whirr can fail to completely start a cluster and indicates this with a non-zero return code. In many (currently intermittent) partial failure scenarios, there are resources still active (EC2 machine instances, in my experience) that are not cleaned up.
The log contains "IOException: Too many instance failed while bootstrapping!" when I have seen orphaned nodes.
A non-zero return code should guarantee that all resources are cleaned up. Without this post-condition, these failures require manual inspection and cleanup to stop useless expenses (which is why I marked this bug critical; it needs to be addressed for any kind of cron job triggered whirr).