[KUDU-1127] Avoid holding RPC handler threads on replicas that are part of a degraded tablet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Private Beta
Fix Version/s: 1.2.0
Component/s: tserver
Labels:
None

Target Version/s:

1.2.0

Description

If the client performs a snapshot scan, we may need to wait for the leader to tell us that the timestamp is "safe". If the majority of nodes in a tablet are down, this will never happen. After ~~KUDU-689~~, well wait with a deadline, but even this multi-second wait will end up blocking a lot of RPC handlers, potentially preventing other useful work from getting done.

We should probably short-circuit the wait in the case that we haven't heard from any leader within the election timeout and just respond immediately. Alternatively, we could make this an async callback vs a blocking wait on handler.

Attachments

Activity

People

Assignee:: David Alves

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/Sep/15 11:36

Updated:: 08/Dec/16 16:31

Resolved:: 08/Dec/16 16:31