Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
8.8.1
-
None
-
None
-
New
Description
While working on an unrelated getBulkPath API PR, Michael McCandless and I came across a nice optimization that could be made to the taxonomy cache.
The taxonomy cache today caches frequently used ordinals and their corresponding FacetLabels. It uses the existing LRUHashMap (backed by a LinkedList) class for its implementation.
This implementation performs sub optimally when it has a large number of threads accessing it, and consumes a large amount of RAM.
Michael McCandless suggested the idea of a two array backed HPPC int->FacetLabel cache. The basic idea behind the cache being:
- We use two hashmaps primary and secondary.
- In case of a cache miss in the primary and a cache hit in the secondary, we add the key to the primary map as well.
- In case of a cache miss in both the maps, we add it to the primary map.
- When we reach (make this check each time we insert?) a large number of elements in say the primary cache, (say larger than the existing DEFAULT_CACHE_VALUE=4000), we dump the secondary map and copy all the values of the primary map into it.
The idea was originally explained in this comment.