Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1337

DeleteTablet can cause spurious unfruitful remote bootstraps

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.7.0
    • 0.8.0
    • recovery, tserver
    • None

    Description

      While triaging a cascading YCSB failure, we noticed the following sequence of events:

      1. Client deleted a table.
      2. Master serviced the request.
      3. Master issued DeleteTablet for a particular tablet to a quorum of 3 peers.
      4. Due to load or whatever, the followers received and processed the DeleteTablet before the leader.
      5. The leader noticed the the followers no longer had the tablet, and told them to remote bootstrap it from itself.
      6. The leader began servicing the DeleteTablet.
      7. The followers began remote bootstrapping, which killed the leader due to KUDU-1328. If the leader hadn't died, the followers' remote bootstrap sessions would have failed.
      8. There's an open question for this step: is any bad "state" left in the followers? Or do the remote bootstrap sessions abort cleanly?

      Anyway, the fact that the replicas handled the DeleteTablet before the leader led to unnecessary remote bootstrap work. We should avoid this.

      Note: Todd suspects that delete_table-test's flakiness may be due to this behavior. I didn't look into it, but whomever tackles this should consider that possibility.

      Attachments

        Issue Links

          Activity

            People

              mpercy Mike Percy
              adar Adar Dembo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: