Description
Currently, Kudu will attempt to check the integrity of its FS layout by checking that all data dirs exist where they're expected, and that all of them "know" about the rest of the data dirs in the FS layout. When a data dir is missing on disk (e.g. because the underlying disk was yanked and a new one was put in), this currently means that all other data dirs will expect a data dir that will be missing. Following KUDU-2359, Kudu will accept this and start up, but label the data dir as "failed", alerting users that something on disk is inconsistent with the users' FS config, at which point, they can run `kudu fs update_dirs` with the expected directories.
This isn't a great user experience for a couple reasons: 1) it adds more legwork and more downtime when recovering from disk failures, performing hardware upgrades, etc., 2) if the user is repairing a disk failure, the "new" directories input to the `kudu fs update_dirs` tool will be identical to the old ones (or more cautiously be done as a removal and then addition), which is somewhat confusing. The `kudu fs update_dirs` tool is already smart enough to tell users when attention is needed (e.g. if removing directories with tablets striped across them); it wouldn't be unreasonable to think that we could put it in front of (or mirror the behavior in front of) a server startup.
For administrators who prefer tooling, it probably makes sense to maintain the current, more conservative, less automatic codepaths, and gate it by some flag.
Attachments
Issue Links
- is a clone of
-
KUDU-2993 Allow Kudu to start up with a fresh data directory without running update_dirs
- Resolved