Details
-
Bug
-
Status: Patch Available
-
Normal
-
Resolution: Unresolved
-
None
-
Availability - Unavailable
-
Normal
-
Normal
-
User Report
-
All
-
None
-
Description
The issue is reported based on an analysis done in "Unexplained stuck memtable flush" mail thread from Cassandra User/Dev groups.
e2e observations: for a Cassandra node (version: 4.1.1) the memtable flush randomly getting stuck and later as a consequence all writes to memtables are stuck as well due to lack of memory in a memory pool.
Based on the analysis of thread dump ( cass_threads.txt.gz ) and heap dump the following cause has been identified:
We have the following stacktrace for "read-hotness-tracker:1" thread
"read-hotness-tracker:1" daemon prio=5 tid=93 WAITING at jdk.internal.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:323) at org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:289) local variable: org.apache.cassandra.utils.concurrent.WaitQueue$Standard$SignalWithListener#2086139 at org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:282) local variable: org.apache.cassandra.utils.concurrent.WaitQueue$Standard$SignalWithListener#2086139 at org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186) local variable: org.apache.cassandra.utils.concurrent.WaitQueue$Standard$SignalWithListener#2086139 at org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259) at org.apache.cassandra.db.commitlog.AbstractCommitLogService.awaitSyncAt(AbstractCommitLogService.java:324) local variable: org.apache.cassandra.db.commitlog.PeriodicCommitLogService#1 local variable: com.codahale.metrics.Timer$Context#2086795 at org.apache.cassandra.db.commitlog.PeriodicCommitLogService.maybeWaitForSync(PeriodicCommitLogService.java:42) local variable: org.apache.cassandra.db.commitlog.PeriodicCommitLogService#1 at org.apache.cassandra.db.commitlog.AbstractCommitLogService.finishWriteFor(AbstractCommitLogService.java:284) local variable: org.apache.cassandra.db.commitlog.PeriodicCommitLogService#1 at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:307) local variable: org.apache.cassandra.db.commitlog.CommitLog#1 local variable: org.apache.cassandra.io.util.DataOutputBuffer$1$1#61 local variable: org.apache.cassandra.db.commitlog.CommitLogSegment$Allocation#1 at org.apache.cassandra.db.CassandraKeyspaceWriteHandler.addToCommitLog(CassandraKeyspaceWriteHandler.java:100) at org.apache.cassandra.db.CassandraKeyspaceWriteHandler.beginWrite(CassandraKeyspaceWriteHandler.java:54) local variable: org.apache.cassandra.utils.concurrent.OpOrder$Group#8162 at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:628) local variable: org.apache.cassandra.db.Keyspace#8 local variable: org.apache.cassandra.db.Mutation#54491 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:512) at org.apache.cassandra.db.Mutation.apply(Mutation.java:228) at org.apache.cassandra.db.Mutation.apply(Mutation.java:248) at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternalWithoutCondition(ModificationStatement.java:675) at org.apache.cassandra.cql3.statements.ModificationStatement.executeLocally(ModificationStatement.java:666) at org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:447) at org.apache.cassandra.db.SystemKeyspace.persistSSTableReadMeter(SystemKeyspace.java:1488) local variable: java.lang.String#42806 local variable: java.lang.String#47659 local variable: org.apache.cassandra.io.sstable.UUIDBasedSSTableId#1389 local variable: org.apache.cassandra.metrics.RestorableMeter#752 at org.apache.cassandra.io.sstable.format.SSTableReader$GlobalTidy$1.run(SSTableReader.java:2170)
where we are waiting in the await() method of WaitQueue$Standard$SignalWithListener#2086139 object.
another input from a heap dump: WaitQueue$Standard$SignalWithListener#2086139 has the following field values based on the heap dump screenshots:
state = -1 (it means CANCELLED: org.apache.cassandra.utils.concurrent.WaitQueue.Standard#CANCELLED = -1)
thread = null
According to the code in WaitQueue.java such combination of values only possible if we invoked cancel() on this SignalWithListener object but we didn't do it in our code, it is in another branch of AbstractCommitLogService.awaitSyncAt
but let's check the await() logic more carefully:
public Signal await() throws InterruptedException { while (!isSignalled()) { checkInterrupted(); LockSupport.park(); <--- we are here } checkAndClear(); return this; }
and inspect unassuming checkInterrupted() method
private void checkInterrupted() throws InterruptedException { if (Thread.interrupted()) { cancel(); throw new InterruptedException(); } }
here we see that if the currently awaiting thread is interrupted then SignalWithListener cancels itself.
but we throw new InterruptedException() here, so the execution should fail with it.
Checking the thread stack trace and see that AbstractSignal.await() is invoked from org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
how it looks inside:
public static <A extends Awaitable> A awaitUninterruptibly(A await) { boolean interrupted = false; while (true) { try { await.await(); break; } catch (InterruptedException e) { interrupted = true; } } if (interrupted) Thread.currentThread().interrupt(); return await; }
Here we have a loop, if the current thread is interrupted - we have InterruptedException thrown by await.await(), we catch it here and try await.await() again. But in the previous invocation of await.await() we have cancelled our Signal object, so the condition inside await() method:
isSignalled(): state == SIGNALLED is always false now plus if another thread will invoke notifyAll - nothing will happen, because as a part of cancellation our Signal object unregister itself from the queue and the notifying thread will not able to find it to notify.
I have reproduced the case with a simple unit test.
The last element of the puzzle - why is the thread interrupted? To understand it - let's check there "read-hotness-tracker:1" thread execution started. It is a background periodical runnable in org.apache.cassandra.io.sstable.format.SSTableReader.GlobalTidy
readMeterSyncFuture = new WeakReference<>(syncExecutor.scheduleAtFixedRate(new Runnable() { public void run() { if (obsoletion == null) { meterSyncThrottle.acquire(); SystemKeyspace.persistSSTableReadMeter(desc.ksname, desc.cfname, desc.id, readMeter); } } }, 1, 5, TimeUnit.MINUTES));
let's check how we deal with readMeterSyncFuture ref and we see our interruption:
private void stopReadMeterPersistence() { ScheduledFuture<?> readMeterSyncFutureLocal = readMeterSyncFuture.get(); if (readMeterSyncFutureLocal != null) { readMeterSyncFutureLocal.cancel(true); <========= here we can interrupt the thread readMeterSyncFuture = NULL; } }
stopReadMeterPersistence() is invoked by one of the following branches of logic when an SSTable is not in use anymore:
org.apache.cassandra.io.sstable.format.SSTableReader#markObsolete org.apache.cassandra.db.lifecycle.Helpers#markObsolete org.apache.cassandra.db.lifecycle.Tracker#dropSSTables(...) org.apache.cassandra.db.lifecycle.LifecycleTransaction#doCommit org.apache.cassandra.db.lifecycle.LifecycleTransaction#doAbort
So,the issue is caused by a combination of events with a right timing:
- an SSTable which was become obsolete at the middle of persistSSTableReadMeter background job execution for this SSTable, so we interrupted the job
- persistSSTableReadMeter was at that moment in the middle of write operation, at commit log append step
- and the commit log append step was stuck due to a delay in commit log sync (lastSyncedAt < syncTime condition in AbstractCommitLogService), so we started to wait for it using await
Attachments
Attachments
Issue Links
- is caused by
-
CASSANDRA-16924 Blocking Concurrency Primitives
- Resolved
- links to