HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer instances are stored before entering LRFU's heap - so that lock contention is eased up.
This is a nice performance improvement, but comes at the cost of losing the exact accounting of llap buffer instances - e.g. if user gives a purge command, not all the cache space is free'd up as one'd expect because purge only considers buffers that the policy knows about. In this case we'd see in LLAP's iomem servlet that the LRFU policy is empty, but a table may still have the full content loaded.
Also, if we use text based tables, during cache load, a set of -OrcEncode threads are used that are ephemeral in nature. Attaching buffers to these threads' thread local structures are ultimately lost. In an edge case we could load lots of data into the cache by reading in many distinct smaller text tables, whose buffers never reach LRFU policy, and hence cache hit ratio will be suffering as a consequence (memory manager will give up asking LRFU to evict, and will free up random buffers).
I propose we try and track the amount of data stored in the BP wrapper threadlocals, and flush them into the heap as a first step of a purge request. This will enhance supportability.
We should also replace the ephemeral OrcEncode threads with a thread pool, that could actually serve as small performance improvement on its own by saving time and memory to deal with thread lifecycle management.