Allen, I did extensive studies on all existing systems including puppet, mcollective, chef, cfengine, controlTier, Bcfg2. Most of the configuration management system focus on generating a set of templates and config parameters and push out changes one node at a time. This works fine in small number of machines, but most of the system fails beyond 1800 nodes or become difficult to maintain.
We use tar, ssh, wget, rsync & gpg with custom roles system (https://computing.llnl.gov/linux/genders.html can be an alternative) to manage configuration and packages. Our environment is probably still small to hit the limits of these tools.
Our challenge with managing hadoop cluster is the lack of standard interfaces to reliably monitor the cluster. Standard unix tools expect process to exit with non zero status on error and counters to be positive numbers.
IMHO whats needed here are features like
HADOOP-6728 & HADOOP-7144, make them consistent across all components and integrate them with existing tools, HADOOP-7324 .
Zeroconf is great for resolving service location.
As part of this proposal, are there plans to update how hadoop daemon and client configurations are handled or is this specific to HMS?
Bittorrent is much faster than install software from yum repository for large scale system.
Bittorrent is a file sharing protocol and yum is a utility for rpm package management. I guess you mean to say bittorrent is faster to distribute files than http. If RPM is choose as the package format but don't want to use yum, HMS may need to implement another rpm based package management.
Alternatively, this could just be a yum plugin.
thats my 0.2 cents. But hey, if you want to invest your time in writing Yet Another Monitoring System , I wish you all the best!