[CASSANDRA-14252] Use zero as default score in DynamicEndpointSnitch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 4.0-alpha1, 4.0
Component/s: Legacy/Coordination
Labels:
None

Severity:
Normal

Description

The problem I want to solve is that I found in our deployment, one slow but alive data node can slow down the whole cluster, even caused timeout of our requests.

We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node latency is too high.

I added some debug log, and figured out that in a lot of cases, the score from remote data node was not populated, so the fallback to sortByProximityWithScore never happened. That's why a single slow data node, can cause huge problems to the whole cluster.

In this jira, I'd like to use zero as default score, so that we will get a chance to try remote data node, if local one is slow.

I tested it in our test cluster, it improved the client latency in single slow data node case significantly.

I flag this as a Bug, because it caused problems to our use cases multiple times.

==== logs ===

2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: sortByProximityWithBadness: after sorting by proximity, addresses order change to [ip1, ip2], with scores [1.0]
2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: sortByProximityWithBadness: after sorting by proximity, addresses order change to [ip1, ip2], with scores [0.0]
2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: sortByProximityWithBadness: after sorting by proximity, addresses order change to [ip1, ip2], with scores [1.0]
2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: sortByProximityWithBadness: after sorting by proximity, addresses order change to [ip1, ip2], with scores [1.0]

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

IMG_3180.jpg
19/Mar/18 17:17
75 kB
Jay Zhuang

Issue Links

relates to

CASSANDRA-14555 Verify effect of CASSANDRA-14252 on streaming endpoint selection

Open

Activity

People

Assignee:: Dikang Gu

Reporter:: Dikang Gu

Authors:: Dikang Gu

Reviewers:: Jay Zhuang

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 21/Feb/18 23:46

Updated:: 15/May/20 08:07

Resolved:: 20/Mar/18 22:19