Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-873

Avoid unnecessary flushes in CachedStore

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.1
    • Component/s: kv
    • Labels:
      None
    • Flags:
      Patch

      Description

      The class org.apache.samza.storage.kv.CachedStore is currently calling store.flush() when evicting dirty entries. This in turn causes RocksDB to flush its memtables much more than necessary, causing slowdowns.

      In a mixed put / get workload, e.g. 2 gets for 1 put with an object cache size of 1000, RocksDB will flush its memtable roughly every 333 calls to put(); that is every time the eldest entry from the cache is dirty. In our benchmarks, this leads to a more than 20x drop in throughput.

      The attached patch fixes the issue as follows:

      • CachedStore.put() no longer flushes when evicting dirty entries.
        It calls store.putAll() with all dirty entries and resets the dirty list and count but does not call store.flush().
      • Likewise, CachedStore.cache.removeEldestEntry() no longer flushes when evicting dirty entries.
        It calls store.putAll() on all dirty entries and resets the dirty list and count.
      • The behavior of CachedStore.flush() is unaffected.

        Activity

        Hide
        nickpan47 Yi Pan (Data Infrastructure) added a comment -

        Nicolas Maquet, could you create a reviewboard request for the patch? It would be easier for review.

        Thanks!

        Show
        nickpan47 Yi Pan (Data Infrastructure) added a comment - Nicolas Maquet , could you create a reviewboard request for the patch? It would be easier for review. Thanks!
        Hide
        nicolas@movio.co Nicolas Maquet added a comment - - edited

        Yi Pan (Data Infrastructure) I have already but didn't link it here. Here it its: https://reviews.apache.org/r/43589

        Show
        nicolas@movio.co Nicolas Maquet added a comment - - edited Yi Pan (Data Infrastructure) I have already but didn't link it here. Here it its: https://reviews.apache.org/r/43589
        Hide
        nickpan47 Yi Pan (Data Infrastructure) added a comment -

        Cool! I will take a look asap.

        Thanks!

        Show
        nickpan47 Yi Pan (Data Infrastructure) added a comment - Cool! I will take a look asap. Thanks!
        Hide
        nickpan47 Yi Pan (Data Infrastructure) added a comment -

        +1. Merged and submitted. Thanks!

        Show
        nickpan47 Yi Pan (Data Infrastructure) added a comment - +1. Merged and submitted. Thanks!

          People

          • Assignee:
            nicolas@movio.co Nicolas Maquet
            Reporter:
            nicolas@movio.co Nicolas Maquet
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development