This patch triggers sending of the restart OOB ack to clients who are currently writing data.
The shutdown ordering and timing have been adjusted to give enough time for DataXceiver threads (serving writes) to send the restart OOB ack upstream. First, DataXceiverServer is interrupted and in turn each DataXceiver threads are interrupted by it after closing the server socket to prevent further client connections. Idling DataXceiver threads due to keepalive will simply terminate.
If DataNode#restarting is set, the OOB ack will be directly sent by these threads before taking down the packet responder threads. If the packet responder is in the middle of sending an ack, it can be blocked up to a configured amount of time before failing, which is 1.5 seconds by default. If they started sending but send takes a long time (e.g. slow client, network issue, etc.), they will get interrupted by DataXceiverServer in 2 seconds. DataXceiverServer will tear down sooner if all DataXceiver threads finish less than 2 seconds.
The IPC server is stopped later in order to minimize the chance of shutdownDatanode() response being dropped. The shutdown method will only start interrupting the thread pool after a few seconds have passed since the DataXceiverServer interruption. By this time, all threads must have stopped, but if anyone didn't, they will get interrupted repeatedly. This is an existing behavior.
The main DataNode thread joins on BP service threads. There was a fixed 2 second sleep, which has been changed to only wait until the shutdown is done. If the BP service threads terminated but shutdown() was not called, main thread will delay the exit for 2 seconds as it did before.
This patch does not include the client-side changes, so the OOB ack will not have any visible effects. It will be treated as a node failure, which also happens when a datanode shuts down.