[CASSANDRA-18120] Single slow node dramatically reduces cluster logged batch write throughput regardless of CL - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 4.0.15, 4.1.8, 5.0.3, 5.1
Component/s: Consistency/Coordination
Labels:
None

Change Category:
Operability
Complexity:
Normal
Platform:

All
Impacts:

None
Source Control Link:

https://github.com/apache/cassandra/commit/b8c54362931b817a84c91f8d758aa63995ecb4a1
Test and Documentation Plan:

Hide

CI

Show
CI

Description

We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, QUORUM, or LOCAL_QUORUM)

On clusters of any size - a single extremely slow node causes a ~90% loss of cluster-wide throughput using batched writes. We can replicate this in the lab via CPU or disk throttling. I observe this in 3.11, 4.0, and 4.1.

It appears the mechanism in play is:

Those logged batches are immediately written to two replica nodes and the actual mutations aren't processed until those two nodes acknowledge the batch statements. Those replica nodes are selected randomly from all nodes in the local data center currently up in gossip. If a single node is slow, but still thought to be up in gossip, this eventually causes every other node to have all of its MutationStages to be waiting while the slow replica accepts batch writes.

The code in play appears to be:

See

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245.

In the method filterBatchlogEndpoints() there is a

Collections.shuffle() to order the endpoints and a

FailureDetector.isEndpointAlive() to test if the endpoint is acceptable.

This behavior causes Cassandra to move from a multi-node fault tolerant system toa collection of single points of failure.

We try to take administrator actions to kill off the extremely slow nodes, but it would be great to have some notion of "what node is a bad choice" when writing log batches to replica nodes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

CASSANDRA-18120_50_29_ci_summary.html
19/Sep/24 15:01
1.89 MB
Michael Semb Wever
CASSANDRA-18120_50_29_results_details.tar.xz
19/Sep/24 15:01
2.30 MB
Michael Semb Wever
CASSANDRA-18120_50_61_ci_summary.html
13/Oct/24 16:18
1.89 MB
Michael Semb Wever
CASSANDRA-18120_50_61_results_details.tar.xz
13/Oct/24 16:18
2.31 MB
Michael Semb Wever
CASSANDRA-18120_trunk_28_ci_summary.html
19/Sep/24 15:01
2.10 MB
Michael Semb Wever
CASSANDRA-18120_trunk_28_results_details.tar.xz
19/Sep/24 15:01
2.76 MB
Michael Semb Wever
CASSANDRA-18120_trunk_60_ci_summary.html
13/Oct/24 16:18
2.13 MB
Michael Semb Wever
CASSANDRA-18120_trunk_60_results_details.tar.xz
13/Oct/24 16:18
2.84 MB
Michael Semb Wever

Issue Links

is related to

CASSANDRA-20002 Add latest test config for dynamic_remote batchlog_endpoint_strategy and new auth parameterizedClass maps

Resolved

Activity

People

Assignee:: Shayne Hunsaker

Reporter:: Dan Sarisky

Authors:: Shayne Hunsaker

Reviewers:: Brandon Williams, Dan Sarisky, Michael Semb Wever

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 15/Dec/22 14:50

Updated:: 19/Oct/24 16:12

Resolved:: 15/Oct/24 11:51