Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3641

inconsistent/corrupt counters w/ broken shards never converge

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 1.1.0
    • None
    • None
    • Normal

    Description

      We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption was that there would exist shards with the same node_id, same clock id, but different counts.

      The counter column diffing and reconciliation code assumes that this never happens, and ignores the count. The problem with this is that if there is an inconsistency, the result of a reconciliation will depend on the order of the shards.

      In our case for example, we would see the value of the counter randomly fluctuating on a CL.ALL read, but we would get consistent (whatever the node had) on CL.ONE (submitted to one of the nodes in the replica set for the key).

      In addition, read repair would not work despite digest mismatches because the diffing algorithm also did not care about the counts when determining the differences to send.

      I'm attaching patches that fixes this. The first patch is against our 0.8 branch, which is not terribly useful to people, but I include it because it is the well-tested version that we have used on the production cluster which was subject to this corruption.

      The other patch is against trunk, and contains the same change.

      What the patch does is:

      • On diffing, treat as DISJOINT if there is a count discrepancy.
      • On reconciliation, look at the count and deterministically pick the higher one, and:
        • log the fact that we detected a corrupt counter
        • increment a JMX observable counter for monitoring purposes

      A cluster which is subject to such corruption and has this patch, will fix itself with and AES + compact (or just repeated compactions assuming the replicate-on-compact is able to deliver correctly).

      Attachments

        1. 3641-trunk.txt
          7 kB
          Peter Schuller
        2. 3641-0.8-internal-not-for-inclusion.txt
          7 kB
          Peter Schuller
        3. CASSANDRA-3641-trunk-nojmx.txt
          4 kB
          Peter Schuller

        Issue Links

          Activity

            People

              scode Peter Schuller
              scode Peter Schuller
              Peter Schuller
              Sylvain Lebresne
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: