Internally a co-worker profiled their application that was talking to HBase. > 60% of the time was spent in locating a region. This was while the cluster was stable and no regions were moving.
To figure out if there was a faster way to cache region location I wrote up a benchmark here:
This tries to simulate a heavy load on the location cache.
- 24 different threads.
- 2 Deleting location data
- 2 Adding location data
- Using floor to get the result.
To repeat my work just run ./ and it should produce a result.csv
ConcurrentSkiplistMap is a good middle ground. It's got equal speed for reading and writing.
However most operations will not need to remove or add a region location. There will be potentially several orders of magnitude more reads for cached locations than there will be on clearing the cache.
So I propose a copy on write tree map.