Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Duplicate
-
None
-
None
-
Normal
Description
I ran across a scenario where a digest mismatch causes a read-repair that requires all up nodes to be able to respond. If one of these nodes is not responding, then the read-repair is being reported to the client as ReadTimeoutException.
My expection would be that a CL=QUORUM will always succeed as long as 2 nodes are responding. But unfortunetaly the third node being "up" in the ring, but not being able to respond does lead to a RTE.
I came up with a scenario that reproduces the issue:
- set up a 3 node cluster using ccm
- increase the phi_convict_threshold to 16, so that nodes are permanently reported as up
- create attached schema
- run attached reader&writer (which only connects to node1&2). This should already produce digest mismatches
- do a "ccm node3 pause"
- The reader will report a read-timeout with consistency QUORUM (2 responses were required but only 1 replica responded). Within the DigestMismatchException catch-block it can be seen that the repairHandler is waiting for 3 responses, even though the exception says that 2 responses are required.
Attachments
Attachments
Issue Links
- duplicates
-
CASSANDRA-10726 Read repair inserts should not be blocking
- Resolved
- relates to
-
CASSANDRA-7868 Sporadic CL switch from LOCAL_QUORUM to ALL
- Resolved