Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Private Beta
-
None
-
None
Description
Currently, we have some rudimentary write-throttling when a server is under too much RAM pressure. KUDU-542 discusses making it a bit more effective, but we also need to make sure that it's effective in a replicated setup.
For example:
- we have three servers replicating a tablet.
- one of the followers is IO-constrained (or has more load from other tablets) and thus can't flush as fast as the other two nodes. It starts to apply write-throttling in Apply()
Does blocking Apply() end up producing enough backpressure to slow down the other tablets? My guess is not – we probably just end up with a backlogged Apply threadpool work queue, so we have a ton of RAM being used up here instead of being used up in the MRS, and we'd still eventually OOM.