Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
Degradation - Performance Bug/Regression
-
Normal
-
Normal
-
Performance Regression Test
-
All
-
None
-
Description
Me and jasonstack have been investigating a performance regression affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
It's been a bit of a long investigation, but two clues ended up standing out:
1) An abnormal number of expired messages on 4.0 (as shown in the attached system log), while 3.11 has almost none.
2) An abnormal GC activity (as shown in the attached gc log).
Turns out the two are related, as the on expired callback creates a huge amount of strings in the id() call. The next question is what causes all those message expirations; we thoroughly reviewed the internode messaging code and the only issue we could find so far is related to the "batch pruning" calls here and here: it seems too much time is spent on those, causing the event loop to fall behind in processing the rest of the messages, which will end up being expired. This is supported by the analysis of the collapsed stacks (after fixing the GC issue):
(tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5)) org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456 org/apache/cassandra/net/OutboundMessageQueue:access$600 1621 org/apache/cassandra/net/PrunableArrayQueue:prune 1621 org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621 org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
Those are the top 5 sampled calls from EventLoopDelivery#doRun() which spends half of its time pruning. But only a tiny portion of such pruning time is spent actually expiring:
(tprint (top-aggregated-calls oss40nogc "OutboundMessageQueue:pruneInternalQueueWithLock" 5)) org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900 org/apache/cassandra/net/PrunableArrayQueue:prune 1894 org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147 org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147 org/apache/cassandra/net/OutboundConnection:onExpired 147
And indeed, the PrunableArrayQueue:prune() self time is dominant:
(tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5)) org/apache/cassandra/net/PrunableArrayQueue:prune 1718 org/apache/cassandra/net/OutboundConnection:releaseCapacity 27 java/util/concurrent/ConcurrentHashMap:replaceNode 19 java/util/concurrent/ConcurrentLinkedQueue:offer 16 java/util/concurrent/LinkedBlockingQueue:offer 15
That said, before proceeding with a PR to fix those issues, I'd like to understand: what's the reason to prune so often, rather than just when polling the message during delivery? If there's a reason I'm missing, let's talk about how to optimize pruning, otherwise let's get rid of that.
Attachments
Attachments
Issue Links
- is depended upon by
-
CASSANDRA-14747 Test Messaging Refactor with: 200 node, compression=none, encryption=none, coalescing=off
- Open
- relates to
-
CASSANDRA-15766 NoSpamLogger arguments building objects on hot paths
- Resolved
- links to