Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1194

consensus: Allow abort of uncommittable config change ops

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • consensus
    • None

    Description

      Wanted to capture a few thoughts about manually fixing broken configs or automatically rolling back bad config changes. This isn't a fully baked design, just wanted to jot down some initial thoughts.

      A general way to (attempt to) abort uncommitted ops is to truncate the Raft log on the leader (and replace the op with a NO_OP or something similar).

      Some thoughts on recovering from "bad" configs:

      • We may hit a situation where there is an in-progress config change operation that will be impossible to commit due to a majority of the nodes in the "target" config being permanently dead. If the leader is still alive, we can provide a timeout on these ops or a way to explicitly (via RPC) abort them by truncating the log.
      • If no leader is alive, and it's impossible to elect one, then we could write an "unsafe" tool only for emergency use that could do something evil like make the follower think that the tool is the new leader and append an unsafe change-config op to the follower's log.

      Attachments

        Issue Links

          Activity

            People

              mpercy Mike Percy
              mpercy Mike Percy
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: