Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-873

Avoid unnecessary flushes in CachedStore

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10.0
    • 0.10.1
    • kv
    • None
    • Patch

    Description

      The class org.apache.samza.storage.kv.CachedStore is currently calling store.flush() when evicting dirty entries. This in turn causes RocksDB to flush its memtables much more than necessary, causing slowdowns.

      In a mixed put / get workload, e.g. 2 gets for 1 put with an object cache size of 1000, RocksDB will flush its memtable roughly every 333 calls to put(); that is every time the eldest entry from the cache is dirty. In our benchmarks, this leads to a more than 20x drop in throughput.

      The attached patch fixes the issue as follows:

      • CachedStore.put() no longer flushes when evicting dirty entries.
        It calls store.putAll() with all dirty entries and resets the dirty list and count but does not call store.flush().
      • Likewise, CachedStore.cache.removeEldestEntry() no longer flushes when evicting dirty entries.
        It calls store.putAll() on all dirty entries and resets the dirty list and count.
      • The behavior of CachedStore.flush() is unaffected.

      Attachments

        Activity

          People

            nicolas@movio.co Nicolas Maquet
            nicolas@movio.co Nicolas Maquet
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: