Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-808 Throttle/reject writes to avoid running out of memory
  3. KUDU-559

Reject/timeout requests when consensus majority is down

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • M5
    • None
    • consensus
    • None

    Description

      Currently, if a node is the leader but the majority of nodes in the quorum have crashed, it still allows writers to submit messages to the consensus queue, and never times them out. Eventually the queue starts filling up and callers will receive "queue full" errors, but the RPCs associated with the messages stuck in the queue never get responded to. Instead, they should be timed out or otherwise responded to more quickly.

      We also need to handle the case where an old leader has become partitioned from its quorum. It needs to at some point discover that it hasn't successfully heartbeat in longer than the election timeout, and step down on its own, or else clients may continually access it and not ever detect that they need to go find the new leader.

      Attachments

        Activity

          People

            adar Adar Dembo
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: