[KUDU-1395] Scanner KeepAlive requests can get starved on an overloaded server - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.8.0
Fix Version/s: None
Component/s: impala, rpc, tserver
Labels:
- backup

Target Version/s:

Backlog

Description

As of 0.8.0, the RPC system schedules RPCs on an earliest-deadline-first basis, rejecting those with later deadlines. This works well for RPCs which are retried on SERVER_TOO_BUSY errors, since the retries maintain the original deadline and thus get higher and higher priority as they get closer to timing out.

We don't, however, do any retries on scanner KeepAlive RPCs. So, if a keepalive RPC arrives at a heavily overloaded tserver, it will likely get rejected, and won't retry. This means that Impala queries or other long scans that rely on KeepAlives will likely fail on overloaded clusters since the KeepAlive never gets through.

Attachments

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Apr/16 22:33

Updated:: 29/Mar/19 15:34