Ambari's in-built Grafana graph for "JVM GC Times" graph in the HBase - RegionServers dashboard is very wrong and doesn't reflect the times I've grepped across HBase RegionServer logs for util.JvmPauseMonitor.
I've inherited a very heavily loaded HBase + OpenTSDB cluster where there are RegionServer losses occurring due to GCs around 30 seconds causing ZK + HMaster to declare them dead. The Grafana graphs show peaks around 70ms due to averaging the GC time spent over all seconds, which smooths out the peaks so as to not show any problem. If you are going to use GCTimeMillis then I believe you need to divide by GCCount.
Otherwise I believe this is actually the wrong metric to be watching and instead the following metric from HBase JMX should be monitored with a value of last. This does show the significant GC time spent:
Obviously make it search for a regex to match whichever garbage collector you are using, whether G1 or CMS etc:
Right now the GC Times graph is worse than useless, it's misleading as it implies there are no GC issues when there are actually very large very severe GC issues on this cluster.
This is a vanilla Ambari deployed Grafana with Ambari Metrics.