We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process
In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm.
Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script
should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done
automatically then it will have to be done manually, by operator restarting the ZK server jvm
The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained -
fits into this nicely.