XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • M5
    • 1.6.0
    • fs
    • None

    Description

      Disk loss is an unfortunate fact of life, and Kudu should provide mechanisms for mitigating disk loss.

      1. Make it possible to isolate specific tablets to some subset of the machine's disks, so that if one disk dies it doesn't take out all the tablets with it. This is more complicated than it looks:
        • We need a concrete way of describing disk groups. It can be per-node, or abstract enough that it makes sense across the entire cluster, or perhaps we aggregate information (e.g. ten machines have 5 disks and the other forty machines have 6 disks).
        • This mechanism needs to be used for both data blocks and other bits of metadata (master blocks, superblocks, and other random files).
        • Presumably it needs to be provided when a table is created (or a tablet is split), and it needs to be persisted as part of tablet metadata. It might be sufficient to express it in Kudu configuration (i.e. complex gflags) but since it can be associated to tablet metadata, it's hard to see how this would work.
      2. When a disk fails, the server needs to handle it appropriately (mark it as failed, put affected tablets in a failed state, etc.).

      Attachments

        Issue Links

          Activity

            People

              awong Andrew Wong
              adar Adar Dembo
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: