[KUDU-2160] Reduce UpdateConsensus RPC timeouts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.5.0
Fix Version/s: n/a
Component/s: consensus
Labels:
None

Description

We will often see many UpdateConsensus() RPC calls time out when disks are slow. We need to investigate this issue further and understand the dynamics better, then find a solution.

When the local disks on a Kudu cluster get overloaded, RaftConsensus metadata fsyncs caused by Raft votes and term changes take longer, which causes the RaftConsensus lock to be held. This causes "stacking" of UpdateConsensus() RPCs, resulting in timeouts.

Attachments

Issue Links

duplicates

KUDU-1788 Raft UpdateConsensus retry behavior on timeout is counter-productive

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Mike Percy

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Sep/17 21:24

Updated:: 27/Sep/17 19:53

Resolved:: 27/Sep/17 19:53