Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2160

Reduce UpdateConsensus RPC timeouts

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.5.0
    • Fix Version/s: n/a
    • Component/s: consensus
    • Labels:
      None

      Description

      We will often see many UpdateConsensus() RPC calls time out when disks are slow. We need to investigate this issue further and understand the dynamics better, then find a solution.

      When the local disks on a Kudu cluster get overloaded, RaftConsensus metadata fsyncs caused by Raft votes and term changes take longer, which causes the RaftConsensus lock to be held. This causes "stacking" of UpdateConsensus() RPCs, resulting in timeouts.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mpercy Mike Percy
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: