Today, a Kudu node's "shutdown" routine is merely exiting abruptly upon receipt of a signal, be it SIGINT, SIGTERM, or (obviously) SIGKILL. Any in-memory state (like MRS or DRS) is lost, and on startup, the WAL must be replayed as part of bootstrap.
It's not hard to conceive of a cleaner shutdown routine.It'd probably be issued via RPC, and it would perform the following steps:
- Quiesce the server so that future RPCs are dropped.
- Abdicate quorum leadership.
- Flush every MRS/DRS.
- GC every WAL.
- Exit gracefully (i.e. run through the TS/Master destructor).
Kudu is meant to recover in the event of a crash, so why bother with a clean shutdown? Why not make every shutdown an "abrupt" one? Well, a clean shutdown would take more time to run, but would also guarantee faster startup because there'd be less work to do during bootstrap. With a clean shutdown, time("work at shutdown") < time("work at startup"), and that would also help making Kudu rolling restarts more efficient. A similar tack was recently taken in HDFS for the same reason.
The easy part (step #5 from the above list) was recently implemented here.