Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.3.0
-
None
Description
72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious regression in the case of a failed tablet copy. As of that patch, the following sequence happens:
- we fetch the remote tablet's metadata, and set our local metadata to match it (including the remote block IDs)
- as we download blocks, we replace remote block ids with local block IDs
- if we fail in the middle, we call DeleteTablet
- this means that, since we still have some remote block IDs in the metadata, the DeleteTablet call deletes local blocks based on remote block IDs. These block ids are likely to belong to other live tablets locally!
This can cause pretty serious dataloss, and has the tendency to cascade around a cluster, since later attempts to copy a tablet with missing blocks will get aborted as well.
Attachments
Issue Links
- relates to
-
KUDU-1853 Error during tablet copy may orphan a bunch of stuff
- Resolved