Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
-
Normal
Description
(Supersedes/includes CASSANDRA-13033 for 3.X & trunk)
We still use ThreadLocal in a couple of places, so I was curious how much faster FastThreadLocal is compared to ThreadLocal. A micro bench tells, that FastThreadLocal has a runtime of ~2.7ns and ThreadLocal of ~4.7ns - about 2ns slower (EDIT: subtracted baseline).
However, looking at the implementations it seems that ThreadLocal has more dependent pointer gets than FastThreadLocal. This (CPU cache misses) is not reflected in the artificial benchmark below.
The patch migrates all Thread instances (except a few in tests) and all ThreadLocal instances.
[java] FastThreadLocalBench.baseline 2 avgt 5 3.023 ± 0.081 ns/op [java] FastThreadLocalBench.fastThreadLocal 2 avgt 5 5.610 ± 0.154 ns/op [java] FastThreadLocalBench.fastThreadLocal 4 avgt 5 5.653 ± 0.042 ns/op [java] FastThreadLocalBench.fastThreadLocal 8 avgt 5 5.763 ± 0.588 ns/op [java] FastThreadLocalBench.fastThreadLocal 12 avgt 5 5.673 ± 0.117 ns/op [java] FastThreadLocalBench.threadLocal 2 avgt 5 7.708 ± 0.723 ns/op [java] FastThreadLocalBench.threadLocal 4 avgt 5 7.604 ± 0.059 ns/op [java] FastThreadLocalBench.threadLocal 8 avgt 5 7.629 ± 0.080 ns/op [java] FastThreadLocalBench.threadLocal 12 avgt 5 7.858 ± 0.483 ns/op