Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2418

ksck should be able to auto-repair single replica tablets (with data loss)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7.0
    • None
    • ksck
    • None

    Description

      There's an established Kudu workflow for manually "repairing" a tablet that has only one working replica, using the unsafe_config_change CLI tool. I used quotes around repairing because while it brings the tablet back to a healthy state as far as Kudu is concerned, the tablet may have suffered data loss. In some circumstances, however, that's something users are willing to accept.

      The problem is when this happens writ large, to an entire cluster. For example, suppose a three node cluster hosting 1000 tablets loses two nodes. It should be possible to automate this repair process so that users needn't script it themselves.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adar Adar Dembo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: