Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-17519

races/leaks in SSTableReader::GlobalTidy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Normal
    • Resolution: Unresolved
    • None
    • Legacy/Core
    • None
    • Correctness - Recoverable Corruption / Loss
    • Normal
    • Normal
    • Unit Test
    • All
    • None
    • Hide

      a simple concurrency unit test is included

      Show
      a simple concurrency unit test is included

    Description

      In Cassandra 4.0/3.11 there are at least two races in SSTableReader::GlobalTidy

      One is a get/get race, explicitly handled as an assertion in:

      http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204

      and it looks like "ok, it's a problem, but let's just not fix it"

      The other one is get/tidy race between 

      http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196

      and

      http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175

       

      The second one can be easily hit by adding a small delay at the beginning of `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually such failure is what prompted the investigation of GlobalTidy correctness)

      There was an attempt on `trunk` to fix these two races.
      The details are not clear to me, and it all looks quite weird. I might be mistaken, but as far as I can see the relevant changes were introduced in:
      https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490
      that is piggybacked on a huge change in CASSANDRA-17008, without a separate ticket or any sort of qa.

      As far as I can see this attempt changes the first race into a leak, and the second race to another race, this time allowing to have multiple GlobalTidy objects for the same sstable (and, as a result, a premature running of obsoletion code).

      I'll follow up with PRs for relevant branches etc etc

      Attachments

        1. CASSANDRA-17519-4.0.txt
          14 kB
          Jakub Zytka
        2. CASSANDRA-17519-4.1-fix.txt
          5 kB
          Jakub Zytka
        3. CASSANDRA-17519-4.1-test-exposing-the-problem.txt
          9 kB
          Jakub Zytka

        Activity

          People

            jakubzytka Jakub Zytka
            jakubzytka Jakub Zytka
            Jakub Zytka
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: