Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-14480

Digest mismatch requires all replicas to be responsive

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • Legacy/Core
    • None
    • Normal

    Description

      I ran across a scenario where a digest mismatch causes a read-repair that requires all up nodes to be able to respond. If one of these nodes is not responding, then the read-repair is being reported to the client as ReadTimeoutException.

       

      My expection would be that a CL=QUORUM will always succeed as long as 2 nodes are responding. But unfortunetaly the third node being "up" in the ring, but not being able to respond does lead to a RTE.

       

       

      I came up with a scenario that reproduces the issue:

      1. set up a 3 node cluster using ccm
      2. increase the phi_convict_threshold to 16, so that nodes are permanently reported as up
      3. create attached schema
      4. run attached reader&writer (which only connects to node1&2). This should already produce digest mismatches
      5. do a "ccm node3 pause"
      6. The reader will report a read-timeout with consistency QUORUM (2 responses were required but only 1 replica responded). Within the DigestMismatchException catch-block it can be seen that the repairHandler is waiting for 3 responses, even though the exception says that 2 responses are required.

       

      Attachments

        1. Reader.java
          2 kB
          Christian Spriegel
        2. schema_14480.cql
          0.3 kB
          Christian Spriegel
        3. Writer.java
          2 kB
          Christian Spriegel

        Issue Links

          Activity

            People

              Unassigned Unassigned
              christianmovi Christian Spriegel
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: