In order to fence a NN, there are several different methods, at varying levels of nastiness:
1) Cooperative active->standby transition or shutdown
In the case of a manual failover, the old primary can gracefully either transition to a standby mode, or gracefully shut down. In this case, since we assume the software to be cooperative, no real "fencing" is necessary – the new NN just needs to unambiguously confirm that the old NN has dropped out of active mode.
This method succeeds only if the old NN remains in full operation.
2) Process killing or death verification (eg via ssh or a second daemon)
In the case that the old primary has either hung (eg deadlock) or crashed (eg JVM segfault), but the host is OK, the new primary may contact that host and send SIGKILL to the NameNode JVM. This may be done either via ssh or via contacting some process which is still running on the node. It is also sufficient to verify that the NN process is no longer running in the case that its JVM crashed.
This method succeeds only if the host of the old NN remains in full operation, despite the NN itself being deadlocked or crashed.
3) Storage fencing
Depending on the type of storage in which the old NN stores its edits directories, the new NN may explicitly fence the storage. This is typically accomplished using a vendor-specific extension. For example, NetApp filers support the command "exportfs -b enable save <nnhost.com> /vol/vol0" which can be remotely issued in order to disallow any further access to a particular mount by a particular host.
In the case of edits stored on BookKeeper in the future, we may be able to implement some kind of lease revocation or fencing within that storage system.
4) Network port fencing
Many switches support remote management. One way to prevent a NameNode from responding to any further requests is to forcibly disable its network port. An alternative similar mechanism is to use something like a LOM card to remotely disable the NIC.
5) Power port fencing (aka STONITH)
Many power distribution units (PDUs) support remote management. The last ditch effort to fence a node is to literally "pull the power"
Since methods 3-5 above are usually vendor-specific implementations, it does not make sense to try to implement a catch-all fencing mechanism within Hadoop. Instead, operators are likely to want to use commonly available shell scripts that work against their preferred hardware. Given this, I would propose that Hadoop's fencing behavior be:
- Configure a list of "fence methods", each with an associated priority.
- Each fence method returns an exit code indicating whether it has successfully fenced the target node.
- If any method succeeds, no further method is attempted.
- If a method fails, continue down the list to try the next method.
- If all fence methods fail, then both nodes remain in "standby" state, and an administrator must manually force the transition after verifying that the other node is no longer active.
The first fence method will always be the "cooperative" method. We can also ship with Hadoop an implementation of method #2 (shoot-the-other-process-in-the-head via ssh). Methods 3-5 would probably be fulfilled by custom site-specific shell scripts, example snippets on a wiki, or existing tools like the fence_* programs that are available from Red Hat.
- do we need to have any kind of framework for unfencing built in to Hadoop? Or is it up to an administrator to "unfence"?
- is it actually a good idea to include "Cooperative shutdown" in this same framework? or should we only call fence when we know it's uncooperative?