[CASSANDRA-14459] DynamicEndpointSnitch should never prefer latent nodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Low
Resolution: Unresolved
Fix Version/s: 5.x
Component/s: Legacy/Coordination
Labels:
- 4.0-feature-freeze-review-requested
- pull-request-available

Description

The DynamicEndpointSnitch has two unfortunate behaviors that allow it to provide latent hosts as replicas:

Loses all latency information when Cassandra restarts
Clears latency information entirely every ten minutes (by default), allowing global queries to be routed to other datacenters (and local queries cross racks/azs)

This means that the first few queries after restart/reset could be quite slow compared to average latencies. I propose we solve this by resetting to the minimum observed latency instead of completely clearing the samples and extending the isLatencyForSnitch idea to a three state variable instead of two, in particular YES, NO, MAYBE. This extension allows EchoMessages and PingMessages to send MAYBE indicating that the DS should use those measurements if it only has one or fewer samples for a host. This fixes both problems because on process restart we send out PingMessages / EchoMessages as part of startup, and we would reset to effectively the RTT of the hosts (also at that point normal gossip EchoMessages have an opportunity to add an additional latency measurement).

This strategy also nicely deals with the "a host got slow but now it's fine" problem that the DS resets were (afaik) designed to stop because the EchoMessage ping latency will count only after the reset for that host. Ping latency is a more reasonable lower bound on host latency (as opposed to status quo of zero).

Attachments

Issue Links

is related to

CASSANDRA-15224 DynamicSnitch.applyConfigChanges can corrupt snitch state

Open

CASSANDRA-14817 Revisit node health, connection health, load balancing, and liveness

Open

links to

GitHub Pull Request #283

Activity

People

Assignee:: Joey Lynch

Reporter:: Joey Lynch

Authors:: Joey Lynch

Reviewers:: Ariel Weisberg, Blake Eggleston

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 19/May/18 18:57

Updated:: 07/Mar/23 10:54

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

25.5h