That's an interesting point -and as one of the people in that QFS discussion, I clearly have opinions.
In favour of pulling out:
- lets people have a faster release cycle, push things out earlier.
- decouples whirr-core from the various services, that can come in their own JARs
- puts the homework of keeping the scripts up to date onto the owners of the source
In favour of retention:
- build and test process can run against all the various services in one go
- either forces whirr core to have stability at both the java level and at the installed functions level, including for services that may be subclasses (e.g. hadoop itself)
- makes it harder to synchronize fixes across the different components. E.g. if a bug is found in the hdp functions, it probably exists in the cdh package.
- no artifacts are actually being redistributed, just the code to get them out and then tell them to start themselves.
My (current) view is that those services that directly subclass other services -such as the specific hadoop installers, hbase &c, ought to go into whirr so they can stay in sync with the parent classes.
At the same time, you'd have to look hard at other services and say "you are more loosely coupled, why don't you stay with your project?".
As an example of this, I also have on github the code to deploy the Ambari management tooling onto a whirr-created cluster; it can then take on the work of provisioning and managing the worker nodes [ https://github.com/steveloughran/whirr/tree/ambari ]. This is currently in the same service as the hdp1 patch above, but could trivially be moved to its own service tree, where it could be built and tested by that team. It's go not dependencies other than on whirr-core, and I do think it would be better off staying outside the whirr codebase -as it is arguably more tightly coupled to the ambari project than to whirr.
w.r.t Bigtop, yes, its RPMs & DEBs should go in. What could be done (here comes another JIRA I feel) is tweak the Hadoop component setup with a few more override points to make this easier. Example: allow for a "setup the RPM repo" script that is independent of what you are installing; some tests that are designed to work against different hadoop installations.
Maybe if we can get a bigtop-rpm/deb installer that works well, the CDH & HDP installers would be extensions of that -and it would become easier to keep them out the core.