Running an HBase scan load through the profiler revealed that RegionMetricsStorage.incrNumericMetric is called way too often.
It turns out that we make this call for each KV in StoreScanner.next(...).
Incrementing AtomicLong requires expensive memory barriers.
The observation here is that StoreScanner.next(...) can maintain a simple
long in its internal loop and only update the metric upon exit. Thus the AtomicLong is not updated nearly as often.
That cuts about 10% runtime from scan only load (I'll quantify this better soon).