Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-635

Implement clean shutdown

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • M5
    • None
    • recovery

    Description

      Today, a Kudu node's "shutdown" routine is merely exiting abruptly upon receipt of a signal, be it SIGINT, SIGTERM, or (obviously) SIGKILL. Any in-memory state (like MRS or DRS) is lost, and on startup, the WAL must be replayed as part of bootstrap.

      It's not hard to conceive of a cleaner shutdown routine.It'd probably be issued via RPC, and it would perform the following steps:

      1. Quiesce the server so that future RPCs are dropped.
      2. Abdicate quorum leadership.
      3. Flush every MRS/DRS.
      4. GC every WAL.
      5. Exit gracefully (i.e. run through the TS/Master destructor).

      Kudu is meant to recover in the event of a crash, so why bother with a clean shutdown? Why not make every shutdown an "abrupt" one? Well, a clean shutdown would take more time to run, but would also guarantee faster startup because there'd be less work to do during bootstrap. With a clean shutdown, time("work at shutdown") < time("work at startup"), and that would also help making Kudu rolling restarts more efficient. A similar tack was recently taken in HDFS for the same reason.

      The easy part (step #5 from the above list) was recently implemented here.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            adar Adar Dembo

            Dates

              Created:
              Updated:

              Issue deployment