Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Hive currently uses either the FIFO or LRFU caching algorithms. My understanding is that LRFU is O(lg n) and makes predictions based on the current working set.
A more modern alternative is TinyLFU which is O(1) by capturing the frequency in a sketch and recency through LRU queues. This allows it to outperforms ARC and LIRS, two of the best policies to date.
Caffeine provides a concurrent implementation by using the write-ahead log approach to record & replay updates (see HighScalability article). It is currently being migrated to in HBASE-15560 and ACCUMULO-4177 for their on-heap caches.
If there is interest let me know if I can be of help.
Attachments
Issue Links
- relates to
-
HIVE-10155 LLAP: switch to sensible cache policy
- Resolved