[SAMZA-957] Avoid unnecessary KV Store flushes (part 3) - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.10.1
Component/s: None
Labels:
None

Description

We had an issue where RocksDB performance severely degraded for 23 hours and then resolved itself. To troubleshoot the issue I gathered some samples of the compaction stats from the RocksDB log and engaged with the RocksDB team via an existing, related issue: https://github.com/facebook/rocksdb/issues/696#issuecomment-222549220

They pointed out that the job was flushing excessively:

If you overload RocksDB with work (i.e. do bunch of writes really fast, or in your case, bunch of small flushes), it will begin stalling writes while the compactions (deferred work) completes. An interesting thing with RocksDB and LSM architecture is that the more behind you are on compactions, the more expensive the compactions are (due to increased write amplifications and single-threadness of L0->L1 compaction). So our write stalls have to be tuned exactly right for RocksDB to behave well with extremely high write rate.

Looking through our commit history I see that ~~SAMZA-812~~ and ~~SAMZA-873~~ have both intended to address this issue, by reducing the amount of flushes in CachedStore.

To be fair, the job in question did not have the ~~SAMZA-873~~ patch, but I see even more room for improvement. Namely, CachedStore should never flush the underlying store unless its flush() was called. It can purge its dirty items to trade off performance for correctness, but flushing is excessive. So, this patch will remove the flushes from the all() and range() methods, simplify the LRU logic, and add a good unit test to verify and explain the proper LRU behavior.

Attachments

SAMZA-957_1.patch
01/Jun/16 02:19
6 kB
Jake Maes

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Jake Maes

Reporter:: Jake Maes

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Jun/16 02:18

Updated:: 07/Jun/16 22:47

Resolved:: 07/Jun/16 22:47

Agile

View on Board

Avoid unnecessary KV Store flushes (part 3)

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment