Details
-
Bug
-
Status: Open
-
Trivial
-
Resolution: Unresolved
-
M5
-
None
Description
Today, a Kudu node's "shutdown" routine is merely exiting abruptly upon receipt of a signal, be it SIGINT, SIGTERM, or (obviously) SIGKILL. Any in-memory state (like MRS or DRS) is lost, and on startup, the WAL must be replayed as part of bootstrap.
It's not hard to conceive of a cleaner shutdown routine.It'd probably be issued via RPC, and it would perform the following steps:
- Quiesce the server so that future RPCs are dropped.
- Abdicate quorum leadership.
- Flush every MRS/DRS.
- GC every WAL.
- Exit gracefully (i.e. run through the TS/Master destructor).
Kudu is meant to recover in the event of a crash, so why bother with a clean shutdown? Why not make every shutdown an "abrupt" one? Well, a clean shutdown would take more time to run, but would also guarantee faster startup because there'd be less work to do during bootstrap. With a clean shutdown, time("work at shutdown") < time("work at startup"), and that would also help making Kudu rolling restarts more efficient. A similar tack was recently taken in HDFS for the same reason.
The easy part (step #5 from the above list) was recently implemented here.
Attachments
Issue Links
- is related to
-
KUDU-2054 Rolling Restart and Upgrade
- Resolved