Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7.0
-
None
-
None
Description
There's an established Kudu workflow for manually "repairing" a tablet that has only one working replica, using the unsafe_config_change CLI tool. I used quotes around repairing because while it brings the tablet back to a healthy state as far as Kudu is concerned, the tablet may have suffered data loss. In some circumstances, however, that's something users are willing to accept.
The problem is when this happens writ large, to an entire cluster. For example, suppose a three node cluster hosting 1000 tablets loses two nodes. It should be possible to automate this repair process so that users needn't script it themselves.
Attachments
Issue Links
- relates to
-
KUDU-2410 Add auto-repair function to ksck to repair "stuck tablet" situations common on older versions
- Resolved