[KUDU-2288] Client should fail fast upon access to an unavailable tablet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: supportability
Labels:
None

Target Version/s:

Backlog

Description

Currently if a tablet has become unavailable for some reason (eg it has lost a majority of replicas), the client will still faithfully retry up to its maximum timeout for a read or write operation. After that timeout, it will sometimes indicate a "timed out" error rather than something more indicative of the root cause.

The retry-on-unavailability behavior is desirable in the case of transient unavailability (eg a node has just failed and a re-election is occurring). But if the tablet has been unavailable for quite some time (eg longer than the client timeout, or longer than N heartbeat intervals for some N) than we can assume that it's unlikely to recover within the timeout, and it would be preferable to fail fast with an appropriate exception.

Attachments

Issue Links

relates to

KUDU-2287 Add replica metric tracking time since there was a valid leader

Resolved

Activity

People

Assignee:: Attila Bukor

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Feb/18 01:33

Updated:: 19/Apr/18 13:19