The class org.apache.samza.storage.kv.CachedStore is currently calling store.flush() when evicting dirty entries. This in turn causes RocksDB to flush its memtables much more than necessary, causing slowdowns.
In a mixed put / get workload, e.g. 2 gets for 1 put with an object cache size of 1000, RocksDB will flush its memtable roughly every 333 calls to put(); that is every time the eldest entry from the cache is dirty. In our benchmarks, this leads to a more than 20x drop in throughput.
The attached patch fixes the issue as follows:
- CachedStore.put() no longer flushes when evicting dirty entries.
It calls store.putAll() with all dirty entries and resets the dirty list and count but does not call store.flush().
- Likewise, CachedStore.cache.removeEldestEntry() no longer flushes when evicting dirty entries.
It calls store.putAll() on all dirty entries and resets the dirty list and count.
- The behavior of CachedStore.flush() is unaffected.