Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1337

DeleteTablet can cause spurious unfruitful remote bootstraps

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: recovery, tserver
    • Labels:
      None

      Description

      While triaging a cascading YCSB failure, we noticed the following sequence of events:

      1. Client deleted a table.
      2. Master serviced the request.
      3. Master issued DeleteTablet for a particular tablet to a quorum of 3 peers.
      4. Due to load or whatever, the followers received and processed the DeleteTablet before the leader.
      5. The leader noticed the the followers no longer had the tablet, and told them to remote bootstrap it from itself.
      6. The leader began servicing the DeleteTablet.
      7. The followers began remote bootstrapping, which killed the leader due to KUDU-1328. If the leader hadn't died, the followers' remote bootstrap sessions would have failed.
      8. There's an open question for this step: is any bad "state" left in the followers? Or do the remote bootstrap sessions abort cleanly?

      Anyway, the fact that the replicas handled the DeleteTablet before the leader led to unnecessary remote bootstrap work. We should avoid this.

      Note: Todd suspects that delete_table-test's flakiness may be due to this behavior. I didn't look into it, but whomever tackles this should consider that possibility.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mpercy Mike Percy
                Reporter:
                adar Adar Dembo
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: