Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-19703

Newly inserted prepared statements got evicted too early from cache that leads to race condition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Triage Needed
    • Normal
    • Resolution: Unresolved
    • 4.1.x
    • None
    • None
    • All

    Description

      We're upgrading from Cassandra 4.0 to Cassandra 4.1.3 and system.prepared_statements table size start growing to GB size after upgrade. This slows down node startup significantly when it's doing preloadPreparedStatements

      I can't share the exact log but it's a race condition like this:

      1. [Thread 1] Receives a prepared request for S1. Attempts to get S1 in cache
      2. [Thread 1] Cache miss, put this S1 into cache
      3. [Thread 1] Attempts to write S1 into local table
      4. [Thread 2] Receives a prepared request for S2. Attempts to get S2 in cache
      5. [Thread 2] Cache miss, put this S2 into cache
      6. [Thread 2] Cache is full, evicting S1 from cache
      7. [Thread 2] Attempts to delete S1 from local table
      8. [Thread 2] Tombstone inserted for S1, delete finished
      9. [Thread 1] Record inserted for S1, write finished

      Thread 2 inserted a tombstone for S1 earlier than Thread 1 was able to insert the record in the table. Hence the data will not be removed because the later insert has newer write time than the tombstone.

      Whether this would happen or not depends on how the cache decides what’s the next entry to evict when it’s full. We noticed that in 4.1.3 Caffeine was upgraded to 2.9.2 CASSANDRA-15153

       

      I did a small research in Caffeine commits. It seems this commit was causing the entry got evicted to early: Eagerly evict an entry if it too large to fit in the cache(Feb 2021), available after 2.9.0: https://github.com/ben-manes/caffeine/commit/464bc1914368c47a0203517fda2151fbedaf568b

      And later fixed in: Improve eviction when overflow or the weight is oversized(Aug 2022), available after 3.1.2: https://github.com/ben-manes/caffeine/commit/25b7d17b1a246a63e4991d4902a2ecf24e86d234

      Previously an attempt to centralize evictions into one code path led to a suboptimal approach (464bc19
      ). This tried to move those entries into the LRU position for early eviction, but was confusing and could too aggressively evict something that is desirable to keep.

       

      I upgrade the Caffeine to 3.1.8 (same as 5.0 trunk) and this issue is gone. But I think this version is not compatible with Java 8.

      I'm not 100% sure if this is the root cause and what's the correct fix here. Would appreciate if anyone can have a look, thanks

       

       

      Attachments

        Activity

          People

            cam1982 Cameron Zemek
            yukei Yuqi Yan
            Cameron Zemek
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: