Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2627

Automatically "fix" inconsistent data directories

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • 1.12.0
    • fs
    • None

    Description

      Currently, Kudu will attempt to check the integrity of its FS layout by checking that all data dirs exist where they're expected, and that all of them "know" about the rest of the data dirs in the FS layout. When a data dir is missing on disk (e.g. because the underlying disk was yanked and a new one was put in), this currently means that all other data dirs will expect a data dir that will be missing. Following KUDU-2359, Kudu will accept this and start up, but label the data dir as "failed", alerting users that something on disk is inconsistent with the users' FS config, at which point, they can run `kudu fs update_dirs` with the expected directories.

      This isn't a great user experience for a couple reasons: 1) it adds more legwork and more downtime when recovering from disk failures, performing hardware upgrades, etc., 2) if the user is repairing a disk failure, the "new" directories input to the `kudu fs update_dirs` tool will be identical to the old ones (or more cautiously be done as a removal and then addition), which is somewhat confusing. The `kudu fs update_dirs` tool is already smart enough to tell users when attention is needed (e.g. if removing directories with tablets striped across them); it wouldn't be unreasonable to think that we could put it in front of (or mirror the behavior in front of) a server startup.

      For administrators who prefer tooling, it probably makes sense to maintain the current, more conservative, less automatic codepaths, and gate it by some flag.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              awong Andrew Wong
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: