Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14983

Evaluate TinyLFU cache

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • llap
    • None

    Description

      The ORC low-level cache is either FIFO or LRFU, with the latter being the default. Least-Recently-Freq-Used is an O(lg n) policy that tries to balance recency with frequency for the working set. It uses a heap for frequency ordering and a linked list for recency ordering.

      TinyLFU is an O(1) policy that uses a sketch to probabilistically estimate an entry's frequency. Instead of focusing on eviction, the policy filters out low-value entries from entering the cache. It retains a larger history than the working set by retaining the frequency in a compact counter array (e.g. 4-bit CountMin Sketch). Simulations show that a small LRU window + TinyLFU + SLRU main cache has the best performance.

      If the project is ready to adopt Java 8 then it could use Caffeine, the successor to Guava's cache. It provides improved concurrency in addition to the higher hit rate.

      But I think extracting the project's CountMin Sketch for a custom implementation would be a win. The result would be simpler code and an improved hit rates.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ben.manes Ben Manes
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: