The goal of this feature is to be able to run multiple Nimbus servers so that if one goes down another one will transparently take over. Here's what needs to happen to implement this:
1. Everything currently stored on local disk on Nimbus needs to be stored in a distributed and reliable fashion. A DFS is perfect for this. However, as we do not want to make a DFS a mandatory requirement to run Storm, the storage of these artifacts should be pluggable (default to local filesystem, but the interface should support DFS). You would only be able to run multiple NImbus if you use the right storage, and the storage interface chosen should have a flag indicating whether it's suitable for HA mode or not. If you choose local storage and try to run multiple Nimbus, one of the Nimbus's should fail to launch.
2. Nimbus's should register themselves in Zookeeper. They should use a leader election protocol to decide which one is currently responsible for launching and monitoring topologies.
3. StormSubmitter should find the Nimbus to connect to via Zookeeper. In case the leader changes during submission, it should use a retry protocol to try reconnecting to the new leader and attempting submission again.