Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-635

Implement clean shutdown

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Trivial
    • Resolution: Unresolved
    • Affects Version/s: M5
    • Fix Version/s: None
    • Component/s: recovery

      Description

      Today, a Kudu node's "shutdown" routine is merely exiting abruptly upon receipt of a signal, be it SIGINT, SIGTERM, or (obviously) SIGKILL. Any in-memory state (like MRS or DRS) is lost, and on startup, the WAL must be replayed as part of bootstrap.

      It's not hard to conceive of a cleaner shutdown routine.It'd probably be issued via RPC, and it would perform the following steps:

      1. Quiesce the server so that future RPCs are dropped.
      2. Abdicate quorum leadership.
      3. Flush every MRS/DRS.
      4. GC every WAL.
      5. Exit gracefully (i.e. run through the TS/Master destructor).

      Kudu is meant to recover in the event of a crash, so why bother with a clean shutdown? Why not make every shutdown an "abrupt" one? Well, a clean shutdown would take more time to run, but would also guarantee faster startup because there'd be less work to do during bootstrap. With a clean shutdown, time("work at shutdown") < time("work at startup"), and that would also help making Kudu rolling restarts more efficient. A similar tack was recently taken in HDFS for the same reason.

      The easy part (step #5 from the above list) was recently implemented here.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                adar Adar Dembo
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: