Description
Using whirr, I experimented some problems. Especially, if at least one of the steps for installing and configuring a service meets some exception or error, whirr cannot construct the running cluster. In another case, some of entire clusters do not startup by whirr due to unknown reasons. In both cases, users have to recover them manually. It may be really burden for users.
However, the current implementation does not support any recovery ways. In addition, service scripts (install and post-configure) bundles up too many tasks. For example, for hadoop service the configuration script mounts devices, configures *-site.xml files, and executes the service start script. It makes users hard to reuse that scripts.
Thus, I propose that we separate both scripts (i.e., install and post-configure) into more normalized and detailed functions, such as service-clean, service-install, service-configuration, service-start, and service-stop. In addition, they should be executable individually by a user.
I expect that it will give us many benefits: 1) it makes debugging easier 2) when an exception or error occurs during starting a service, a user can recover easily the cluster by using appropriate functions.