Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16047

Potential race condition in creating hard link when incremental backup is turned on

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Urgent
    • Resolution: Unresolved
    • None
    • Local/SSTable
    • None
    • Correctness - Recoverable Corruption / Loss
    • Normal
    • Normal
    • User Report
    • All
    • None

    Description

      It seems that there is a race condition in creating hard link if incremental backup is turned on.

      The following screenshot was captured in a production cluster running Cassandra 3.0.15 after turning on incremental backup. When this NoSuchFileException happens, due to the FSWriteError and the default disk failure policy, the JVM will be shutdown, so it's a pretty critical bug.

      Due to the risk of causing production database downtime (if similar issue happens on multiple nodes in a short time frame), and same exception causing JVM shutdown multiple times already, incremental backup had to be turned off for now, but this is not an ideal situation.

      The deployment is on a public cloud environment with EBS-like disks that are backed by SSD with decent latency, throughput and IOPS, so it is hard to think the culprit being in the OS and IO layer. Based on the second screenshot above, this is a low flush traffic system.size_estimates table, so compaction of the source SSTable doesn't seem to be at play here.

      Attachments

        Activity

          People

            Unassigned Unassigned
            weideng Wei Deng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: