There are a few components to this.
1) The Framework: This is going to be responsible for starting up and managing the fail over of brokers within the mesos cluster. This will have to get some Kafka focused paramaters for launching new replica brokers, moving topics and partitions around based on what is happening in the grid through time.
2) The Scheduler: This is what is going to ask for resources for Kafka brokers (new ones, replacement ones, commissioned ones) and other operations such as stopping tasks (decommissioning brokers). I think this should also expose a user interface (or at least a rest api) for producers and consumers so we can have producers and consumers run inside of the mesos cluster if folks want (just add the jar)
3) The Executor : This is the task launcher. It launches tasks kills them off.
4) Sharing data between Scheduler and Executor: I looked at the a few implementations of this. I like parts of the Storm implementation but think using the environment variable ExectorInfo.CommandInfo.Enviornment.Variables is the best shot. We can have a command line bin/kafka-mesos-scheduler-start.sh that would build the contrib project if not already built and support conf/server.properties to start.
The Framework and operating Scheduler would run in on an administrative node. I am probably going to hook Apache Curator into it so it can do it's own failure to a another follower. Running more than 2 should be sufficient as long as it can bring back it's state (e.g. from zk). I think we can add this in after once everything is working.
Additional detail can be found on the Wiki page https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38570672