[CASSANDRA-14480] Digest mismatch requires all replicas to be responsive - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Duplicate
Fix Version/s: None
Component/s: Legacy/Core
Labels:
None

Severity:
Normal

Description

I ran across a scenario where a digest mismatch causes a read-repair that requires all up nodes to be able to respond. If one of these nodes is not responding, then the read-repair is being reported to the client as ReadTimeoutException.

My expection would be that a CL=QUORUM will always succeed as long as 2 nodes are responding. But unfortunetaly the third node being "up" in the ring, but not being able to respond does lead to a RTE.

I came up with a scenario that reproduces the issue:

set up a 3 node cluster using ccm
increase the phi_convict_threshold to 16, so that nodes are permanently reported as up
create attached schema
run attached reader&writer (which only connects to node1&2). This should already produce digest mismatches
do a "ccm node3 pause"
The reader will report a read-timeout with consistency QUORUM (2 responses were required but only 1 replica responded). Within the DigestMismatchException catch-block it can be seen that the repairHandler is waiting for 3 responses, even though the exception says that 2 responses are required.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Reader.java
30/May/18 13:24
2 kB
Christian Spriegel
schema_14480.cql
30/May/18 13:21
0.3 kB
Christian Spriegel
Writer.java
30/May/18 13:24
2 kB
Christian Spriegel

Issue Links

duplicates

CASSANDRA-10726 Read repair inserts should not be blocking

Resolved

relates to

CASSANDRA-7868 Sporadic CL switch from LOCAL_QUORUM to ALL

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Christian Spriegel

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 30/May/18 13:18

Updated:: 16/Apr/19 09:29

Resolved:: 18/Jun/18 19:17