Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
1.40.0
-
None
-
None
Description
I am analyzing a situation, where I get actually hundreds of threads having a stacktrace like this:
74.118.98.131 [1643218398059] GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json HTTP/1.1" #58 prio=5 os_prio=0 cpu=26006.94ms elapsed=1206.98s tid=0x0000560a69765000 nid=0x138d waiting for monitor entry [0x00007f8f1b15b000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.access(CacheLIRS.java:910) - waiting to lock <0x00000006a1c5c1a0> (a org.apache.jackrabbit.oak.cache.CacheLIRS$Segment) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.get(CacheLIRS.java:893) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.get(CacheLIRS.java:958) at org.apache.jackrabbit.oak.cache.CacheLIRS.get(CacheLIRS.java:299) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.getNode(DocumentNodeStore.java:1271) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore$8.apply(DocumentNodeStore.java:1449) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore$8.apply(DocumentNodeStore.java:1445)
Checking the code at [1] the most basic Java synchronization mechanism (the synchronized keyword) is used. According to this DZone article [2] this can be problematic, as with every thread leaving such a synchronized block all threads waiting for this lock ware woken up but only 1 thread might enter this section; the others are sent back to sleep. It recommends to use a ReentrantReadWriteLock instead, which is much smarter and just wakes up 1 thread.
In my situation I had a huge CPU usage during that situation, which I am not able to explain because the threaddumps did show that there was hardly any other thread working there, but the vast majority were blocked like above.
While I think, that such an improvement might now have fully avoided the problem I face I think that such an optimization is still useful. This is a heavily used code-path and if there's a way to reduce the overhead of locking itself it would highly useful.