[LUCENE-10068] Switch to a "double barrel" HPPC cache for the taxonomy LRU cache - ASF JIRA

Add vote

Watch issue

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 8.8.1
Fix Version/s: None
Component/s: modules/facet
Labels:
None

Lucene Fields:

New

Description

While working on an unrelated getBulkPath API PR, Michael McCandless and I came across a nice optimization that could be made to the taxonomy cache.

The taxonomy cache today caches frequently used ordinals and their corresponding FacetLabels. It uses the existing LRUHashMap (backed by a LinkedList) class for its implementation.

This implementation performs sub optimally when it has a large number of threads accessing it, and consumes a large amount of RAM.

Michael McCandless suggested the idea of a two array backed HPPC int->FacetLabel cache. The basic idea behind the cache being:

We use two hashmaps primary and secondary.
In case of a cache miss in the primary and a cache hit in the secondary, we add the key to the primary map as well.
In case of a cache miss in both the maps, we add it to the primary map.
When we reach (make this check each time we insert?) a large number of elements in say the primary cache, (say larger than the existing DEFAULT_CACHE_VALUE=4000), we dump the secondary map and copy all the values of the primary map into it.

The idea was originally explained in this comment.