Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
Description
Chatting today w/ a mighty hbase operator on how to figure what is happening during transitory latency spike or any other transitory 'weirdness' in a server, the idea came up that a java flight recording during a spike would include a pretty good picture of what is going on during the time of duress (more ideal would be a trace of the explicit slow queries showing call stack with timings dumped to a sink for later review; i.e. trigger an htrace when a query is slow...).
Taking a look, programmatically triggering a JFR recording seems doable, if awkward (MBean invocations). There is even a means of specifying 'triggers' based off any published mbean emission – e.g. a query queue count threshold – which looks nice. See https://community.oracle.com/thread/3676275?start=0&tstart=0 and https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
This feature could start out as a blog post describing how to do it for one server. A plugin on Canary that looks at mbean values and if over a configured threshold, triggers a recording remotely could be next. Finally could integrate a couple of triggers that fire when issue via the trigger mechanism.
Marking as beginner feature.