Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-635

Implement clean shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • M5
    • None
    • recovery

    Description

      Today, a Kudu node's "shutdown" routine is merely exiting abruptly upon receipt of a signal, be it SIGINT, SIGTERM, or (obviously) SIGKILL. Any in-memory state (like MRS or DRS) is lost, and on startup, the WAL must be replayed as part of bootstrap.

      It's not hard to conceive of a cleaner shutdown routine.It'd probably be issued via RPC, and it would perform the following steps:

      1. Quiesce the server so that future RPCs are dropped.
      2. Abdicate quorum leadership.
      3. Flush every MRS/DRS.
      4. GC every WAL.
      5. Exit gracefully (i.e. run through the TS/Master destructor).

      Kudu is meant to recover in the event of a crash, so why bother with a clean shutdown? Why not make every shutdown an "abrupt" one? Well, a clean shutdown would take more time to run, but would also guarantee faster startup because there'd be less work to do during bootstrap. With a clean shutdown, time("work at shutdown") < time("work at startup"), and that would also help making Kudu rolling restarts more efficient. A similar tack was recently taken in HDFS for the same reason.

      The easy part (step #5 from the above list) was recently implemented here.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adar Adar Dembo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: