Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-7489

Track lower bound necessary for a repair, live, without actually repairing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Later
    • None
    • None

    Description

      We will need a few things in place to get this right, but it should be possible to track live what the current health of a single range is across the cluster. If we force an owning node to be the coordinator for an update (so if a non-smart client sends a mutation to a non-owning node, it just proxies it on to an owning node to coordinate the update; this should tend to minimal overhead as smart clients become the norm, and smart clients scale up to cope with huge clusters), then each owner can maintain the oldest known timestamp it has coordinated an update for that was not acknowledged by every owning node it propagated it to. The minimum of all of these for a region is the lower bound from which we need to either repair, or retain tombstones. With vnode file segregation we can mark an entire vnode range as repaired up to the most recently determined healthy lower bound.

      There are some subtleties with this, but it means tombstones can be cleared potentially only minutes after they are generated, instead of days or weeks. It also means even repairs can be even more incremental, only operating over ranges and time periods we know to be potentially out of sync.

      It will most likely need RAMP transactions in place, so that atomic batch mutations are not serialized on non-owning nodes. Having owning nodes coordinate updates is to ensure robustness in case of a single node failure - in this case all ranges owned by the node are considered to have a lower bound of -Inf. Without this a single node being down would result in the entire cluster being considered out of sync.

      We will still need a short grace period for clients to send timestamps, and we would have to outright reject any updates that arrived with a timestamp near to that window expiring. But that window could safely be just minutes.

      Attachments

        Activity

          People

            Unassigned Unassigned
            benedict Benedict Elliott Smith
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: