Currently the MetricsContainer objects are stored in ThreadLocal state. This means that scoping a new container involves a get and a set of thread-local state. By instead putting a wrapper object in the thread-local state we can use a single-lookup in thread local state to get/set and then reset.
This is showing up as a possible 7% cpu improvement in a nexmark query benchmark
Additionally I think that removing from the threadlocal state is causing overhead in get calls by causing the linear probing within the implementation of ThreadLocal state to become more expensive.