[KUDU-2144] Add metric for reactor load - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.0
Component/s: metrics, ops-tooling
Labels:
None

Target Version/s:

1.6.0

Description

Recently I was debugging a cluster that appeared to have network issues. Only after lots of investigation did I realize that the reactor threads were not keeping up with network traffic due to hitting ~~KUDU-1964~~ (this cluster was running 1.3.0). At first glance the reactors did not seem busy, since each was only using ~25% of a CPU – however, the other 75% of the time was spent blocked on OpenSSL locks and not in epoll_wait as one would normally expect.

This would be easier to diagnose if we had a metric showing the amount of time the reactors spend idle (ie in epoll_wait) vs doing work (ie executing callbacks, IO, etc). If any reactor is spending a high percentage of time not in epoll, that suggests the reactors may be a bottleneck and increasing latency or degrading throughput.

Attachments

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Sep/17 00:09

Updated:: 15/Sep/17 05:48

Resolved:: 15/Sep/17 05:48