Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3530

Add guardrails to prevent inconsistencies on attemps to add multiple Kudu masters at once in a cluster

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • master
    • None

    Description

      There have been a few reports on inconsistencies in the system catalog tablet's Raft configuration upon trying to add multiple new masters at once into a Kudu cluster. It seems the current implementation of the AddMaster RPC isn't thread-safe, since the Raft configuration of the system catalog tablet became corrupted after an attempt to add multiple extra masters at once (i.e. starting multiple of those to-be-added-masters at once). The original Kudu master reported an error like below upon next restart:

      Invalid argument: RunMasterServer() failed: Unable to initialize catalog manager: Failed to initialize sys tables async: on-disk master list (:0) and provided master list (m1.my.org:7051, m2.my.org:7051, m3.my.org:7051) differ by more than one address. Their symmetric difference is: :0, m1.my.org:7051, m2.my.org:7051, m3.my.org:7051
      

      It would be great to have guardrails preventing such a corruption. Essentially, we should enforce the one-new-master-at-a-time invariant which the current implementation implicitly assumes, but doesn't consistently enforce.

      Attachments

        Activity

          People

            Unassigned Unassigned
            aserbin Alexey Serbin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: