[ZOOKEEPER-3037] Add JvmPauseMonitor to ZooKeeper - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.4.0, 3.5.0
Fix Version/s: 3.6.0
Component/s: contrib
Labels:
- pull-request-available

Description

After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot of time waiting on GC (there is still some misconception that ZK is a storage).

To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our case, see below), it will alert/make a log entry. It can also monitor the time GC took.

The class implementing this is in HADOOP-common, but ZK should not depend on this package. Since this is a straightforward implementation, and in the past five years the few commits it had is nothing really serious, I think we could just copy this class in ZooKeeper, and introduce it as a configurable feature, by default it can be off.

The class:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java

Task:

Create a class in ZK (under zookeeper/server/util/) called JvmPauseMonitor.
Make feature configurable, by default: OFF
Make sleep time and threshold time configurable
Update documentation
Add [current size of the heap OR % of heap used] in the log entry whenever sleep threshold had exceeded by a lot (10s)

Attachments

Issue Links

is related to

ZOOKEEPER-4202 Add JvmPauseMonitor to ZooKeeper on branch 3.5

Resolved

links to

GitHub Pull Request #904

GitHub Pull Request #1594

Pull Request

Activity

People

Assignee:: Norbert Kalmár

Reporter:: Norbert Kalmár

Votes:: 2 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 09/May/18 14:57

Updated:: 17/Nov/22 13:33

Resolved:: 18/Apr/19 17:18

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2.5h