-
Type:
Sub-task
-
Status: Closed
-
Priority:
Major
-
Resolution: Duplicate
-
Affects Version/s: 1.1.0
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Target Version/s:
When we fix safe time advancement, replicas will start to block on snapshot scans until a timeout occurs, waiting to have a consistent view of the world at that timestamp before serving the scan. This will be a serious problem for lagging replicas, which might be several seconds or even minutes behind.
Moreover in the absence of writes, the same will happen even for non-lagging replicas, which will have their safe times updated only when the leader heartbeats.
We need to at least make sure that:
- Blocked scanner threads are not starving other work.
- If the replica's safe time is lagging by a lot, the replica refuses to do the scan and the client retries it on another replica.
We might also consider other optimizations (like pinging the leader for up-to-date replicas that are just waiting on a heartbeat).
- contains
-
KUDU-1656 Scanner timeouts aren't retried when waiting on a transaction
-
- Resolved
-