Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1735

CHECK failure when aborting an ignored config change operation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.1
    • 1.1.0
    • consensus
    • None

    Description

      The following sequence causes a CHECK failure:

      • a tablet server receives a CONFIG_CHANGE operation
      • the tablet server commits the operation (writing the new consensus config to disk), but crashes before it can write the associated COMMIT message to the log
      • the server is down for long enough that it is removed from the configuration again while it's down
      • when it comes back up, it sees the CONFIG_CHANGE again as a pending replicate. When it's added to PendingRounds, it is ignored as we can see that this configuration is already committed.
      • the tserver gets the request from the master to DeleteTablet because it's no longer part of the configuration
        • when trying to abort the operation, it fires a CHECK "Aborting CHANGE_CONFIG_OP but there was no pending config set."

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: