[IMPALA-9233] Add impalad level metrics for query retries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-11

Description

It would nice to have some impalad level metrics related to query retries. This would help answer questions like - how often are queries retried? how often are the retries actually successful? If queries are constantly being retried, then there is probably something wrong with the cluster.

Some possible metrics to add:

Query retry rate (the rate at which queries are retried)
- This can be further divided by retry “type” - e.g. what caused the retry
- Potential categories would be:
  - Queries retried due to failed RPCs
  - Queries retried due to faulty disks
  - Queries retried due to statestore detection of cluster membership changes
A metric that measures how often query retries are actually successful (e.g. if a query is retried, does the retry succeed, or does it just fail again)
- This can help users determine if query retries are actually helping, or just adding overhead (e.g. if retries always fail then something is probably wrong)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sahil Takiar

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Dec/19 00:39

Updated:: 21/Mar/20 19:28