Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Availability - Unavailable
-
Normal
-
Normal
-
User Report
-
All
-
None
Description
Saw this crash in production on multiple hosts:
ERROR [MutationStage-165] 2024-12-16 04:36:35,914 JVMStabilityInspector.java:68 - Exception in thread Thread[MutationStage-165,5,SharedPool] java.lang.RuntimeException: java.lang.Error: Maximum permit count exceeded at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108) at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45) at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430) at org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.Error: Maximum permit count exceeded at java.base/java.util.concurrent.Semaphore$Sync.tryReleaseShared(Semaphore.java:198) at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1382) at java.base/java.util.concurrent.Semaphore.release(Semaphore.java:619) at org.apache.cassandra.db.commitlog.AbstractCommitLogService.requestExtraSync(AbstractCommitLogService.java:297) at org.apache.cassandra.db.commitlog.BatchCommitLogService.maybeWaitForSync(BatchCommitLogService.java:40) at org.apache.cassandra.db.commitlog.AbstractCommitLogService.finishWriteFor(AbstractCommitLogService.java:284) at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:330) at org.apache.cassandra.db.CassandraKeyspaceWriteHandler.addToCommitLog(CassandraKeyspaceWriteHandler.java:100) at org.apache.cassandra.db.CassandraKeyspaceWriteHandler.beginWrite(CassandraKeyspaceWriteHandler.java:54) at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:641) at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:489) at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:223) at org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:63) at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78) at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97) ... 6 common frames omitted
Digging through, it looks like in 4.1.0 AbstractCommitLogService was reworked to use Semaphore instead of LockSupport. From a skim of the docs, LockSupport.unpark() is safe to run even if the permit's already available where Semaphore.release() is not.
I think that means that if more writes arrive during the right part of a flush, it double-decrements `haveWork`. Once it's decremented and starts throwing, I think it never successfully makes it to another acquire/try
The least-invasive fix is probably to call `haveWork.drainPermits()` in `requestExtraSync()` right before `release(1)` to ensure that `havework.permits` is 0 right before we increment it. That may technically still leave a race, but an almost-impossibly small one rather than a nontrivial one. It might also be possible to catch that specific Exception and run acquire() or (probably better) drainPermits() and recover. That does feel like a weird thing to do for a semaphore exception, but should be OK in this case since it's just being used to sleep the worker loop instead of actual concurrency control.