[KUDU-1703] Handle snapshot reads that might block indefinitely - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.1.0
Fix Version/s: None
Component/s: None
Labels:
None

Target Version/s:

1.2.0

Description

When we fix safe time advancement, replicas will start to block on snapshot scans until a timeout occurs, waiting to have a consistent view of the world at that timestamp before serving the scan. This will be a serious problem for lagging replicas, which might be several seconds or even minutes behind.

Moreover in the absence of writes, the same will happen even for non-lagging replicas, which will have their safe times updated only when the leader heartbeats.

We need to at least make sure that:

Blocked scanner threads are not starving other work.
If the replica's safe time is lagging by a lot, the replica refuses to do the scan and the client retries it on another replica.

We might also consider other optimizations (like pinging the leader for up-to-date replicas that are just waiting on a heartbeat).

Attachments

Issue Links

contains

KUDU-1656 Scanner timeouts aren't retried when waiting on a transaction

Resolved

Activity

People

Assignee:: David Alves

Reporter:: David Alves

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Oct/16 18:50

Updated:: 27/Nov/16 21:45

Resolved:: 27/Nov/16 21:45