[CASSANDRA-11738] Re-think the use of Severity in the DynamicEndpointSnitch calculation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 3.10
Component/s: Legacy/Core
Labels:
None

Description

~~CASSANDRA-11737~~ was opened to allow completely disabling the use of severity in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. There is probably something we can do to better use the score.

The issue seems to be that severity is given equal weight with latency in the current code, also that severity is only based on disk io. If you have a node that is CPU bound on something (say catching up on LCS compactions because of bootstrap/repair/replace) the IO wait can be low, but the latency to the node is high.

Some ideas I had are:
1. Allowing a yaml parameter to tune how much impact the severity score has in the calculation.
2. Taking CPU load into account as well as IO Wait (this would probably help in the cases I have seen things go sideways)
3. Move the -D from ~~CASSANDRA-11737~~ to being a yaml level setting
4. Go back to just relying on Latency and get rid of severity all together. Now that we have rapid read protection, maybe just using latency is enough, as it can help where the predictive nature of IO wait would have been useful.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

11738.txt
05/Jul/16 22:33
15 kB
Jonathan Ellis

Activity

People

Assignee:: Jonathan Ellis

Reporter:: Jeremiah Jordan

Authors:: Jonathan Ellis

Reviewers:: Jeremiah Jordan

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 09/May/16 16:23

Updated:: 16/Apr/19 09:30

Resolved:: 22/Jul/16 22:25