Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2136

Add a "crashed"/"failed" mode to tablets

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • None
    • None

    Description

      There are a number of errors that currently crash Kudu tablet servers (e.g. disk failure). In the push to keep tablet servers alive in spite of these failures, the affected tablets should no longer service any type of request. Writes should not proceed, scans should be bounced to another tablet server, flushes and compactions should exit early, etc. The tablet should act as though it were deleted, with the exception that its data is not yet deleted, for the sake of durability in case it is the last remaining replica.

      This mode may be need to be entered from a myriad of tablet operations: from corruptions in reading a cfile, from disk failures when flushing to disk (while locks may be held), etc.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            awong Andrew Wong
            awong Andrew Wong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment