Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1605

Blocks can be incorrectly deleted if TS crashes mid-tablet-copy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.10.0
    • 1.0.0
    • tserver
    • None

    Description

      There's currently a bug in the way we handle tablet copies while replacing existing tombstoned tablets:

      • a tablet exists in TABLET_DATA_TOMBSTONED state
      • we begin copying a new replica on top of this one
        • this calls TabletMetadata::ReplaceSuperBlock() using the remote superblock (importantly, this remote superblock contains remote block IDs)
      • we crash mid-copy
      • on restart, we see the "TABLET_DATA_COPYING" state and "roll forward" the deletion of this tablet. However the block IDs here are the IDs from the remote machine, and we incorrectly delete a bunch of blocks.

      This has always been an issue, but was made worse in 0.10 by the fix for KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID matching a local one is quite high, whereas before we'd usually not see this bug.

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: