[CASSANDRA-11724] False Failure Detection in Big Cassandra Cluster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Normal
Resolution: Unresolved
Fix Version/s: None
Component/s: Legacy/Core
Labels:
- gossip
- node-failure

Severity:
Normal

Description

We are running some testing on Cassandra v2.2.5 stable in a big cluster. The setting in our testing is that each machine has 16-cores and runs 8 cassandra instances, and our testing is 32, 64, 128, 256, and 512 instances of Cassandra. We use the default number of vnodes for each instance which is 256. The data and log directories are on in-memory tmpfs file system.
We run several types of workloads on this Cassandra cluster:
Workload1: Just start the cluster
Workload2: Start half of the cluster, wait until it gets into a stable condition, and run another half of the cluster
Workload3: Start half of the cluster, wait until it gets into a stable condition, load some data, and run another half of the cluster
Workload4: Start the cluster, wait until it gets into a stable condition, load some data and decommission one node

For this testing, we measure the total numbers of false failure detection inside the cluster. By false failure detection, we mean that, for example, instance-1 marks the instance-2 down, but the instance-2 is not down. We dig deeper into the root cause and find out that instance-1 has not received any heartbeat after some time from instance-2 because the instance-2 run a long computation process.

Here I attach the graphs of each workload result.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

experiment-result.txt
07/May/16 19:50
50 kB
Jeffrey F. Lukman
Workload1.jpg
06/May/16 04:43
28 kB
Jeffrey F. Lukman
Workload2.jpg
06/May/16 04:43
30 kB
Jeffrey F. Lukman
Workload3.jpg
06/May/16 04:43
29 kB
Jeffrey F. Lukman
Workload4.jpg
06/May/16 04:43
26 kB
Jeffrey F. Lukman

Activity

People

Assignee:: Unassigned

Reporter:: Jeffrey F. Lukman

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 06/May/16 04:43

Updated:: 16/Apr/19 09:30