Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12991

Inter-node race condition in validation compaction

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Low
    • Resolution: Unresolved
    • None
    • Consistency/Repair
    • None

    Description

      Problem:
      When a validation compaction is triggered by a repair it may happen that due to flying in mutations the merkle trees differ but the data is consistent however.

      Example:
      t = 10000:
      Repair starts, triggers validations
      Node A starts validation
      t = 10001:
      Mutation arrives at Node A
      t = 10002:
      Mutation arrives at Node B
      t = 10003:
      Node B starts validation

      Hashes of node A+B will differ but data is consistent from a view (think of it like a snapshot) t = 10000.

      Impact:
      Unnecessary streaming happens. This may not a big impact on low traffic CFs, partitions but on high traffic CFs and maybe very big partitions, this may have a bigger impact and is a waste of resources.

      Possible solution:
      Build hashes based upon a snapshot timestamp.
      This requires SSTables created after that timestamp to be filtered when doing a validation compaction:

      • Cells with timestamp > snapshot time have to be removed
      • Tombstone range markers have to be handled
      • Bounds have to be removed if delete timestamp > snapshot time
      • Boundary markers have to be either changed to a bound or completely removed, depending if start and/or end are both affected or not

      Probably this is a known behaviour. Have there been any discussions about this in the past? Did not find an matching issue, so I created this one.

      I am happy about any feedback, whatsoever.

      Attachments

        Activity

          People

            Unassigned Unassigned
            brstgt Benjamin Roth
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: