...you only need to look two places if you have an issue. If you have no idea where the master is, you have to hunt around the cluster to find it.
I'd imagine it'd be hard getting this patch in if no idea where the master is (And, again, don't we have this problem now if you start up three masters and one fails? You have to hunt around. We need to build the redirect piece regardless such as a link to master on each server page which redirects to current master and such as a history of who was master when in zk).
You could even make the combined master+regionserver daemon work like our current multimaster system by having there be affinity for a certain set of servers.
What kind of nagios alerts would be master particular? We need to add indirection to these now anyways – ask zk who the master is – if more than one master running. Metrics could be a little complicated especially if master moved servers over the period of interest but generally aren't master metrics of less interest since they are generally just aggregates and ganglia or opentsdb do it better job of this anyways?
Logs don't have to be interleaved. Thats just a bit of log4j config?
Yes, could be issue if the daemon is bogged down. The master would be less responsive which should be fine for short periods but if sustained it could be issue.
I'm not going to work on this. I do see it as something that could simplify our deploy story.