Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Resource Resolver 1.0.2
-
JBoss
Description
We are seeing intermittent issues of deadlock while running a Sling based webapp in an app server like JBoss. The deadlock is being seen between the FelixFrameworkWiring and FelixStartLevel threads.
For example analyzing the order of locks taken in the threaddump-1.log (shown below). Here the FelixFrameworkWiring thread has the Global bundle lock at Felix level [1] and is waiting for the lock in ResourceResolverFactoryActivator.checkFactoryPreconditions. While the FelixStartLevel thread has the lock on RRF and is waiting for global lock. Thus resulting in a deadlock
The FelixFrameworkWiring [5] is busy in deactivating components because of a package refresh earlier (which lead to repository getting shutdown and thus triggering deactivation of ResourceResolverFactoryActivator). While the FelixStartLevel [6] thread has activated ResourceResolverFactoryActivator (thus hold the lock) and later requires global lock for some operation.
Looking at the code for ResourceResolverFactoryActivator.checkFactoryPreconditions [2] it appears to take and hold a lock (on this) while making a call to OSGi container. Such a usage might cause issues like deadlock. So it would be better if the ResourceResolverFactoryActivator does not hold any lock while making the call to container services [3]
"FelixFrameworkWiring"
- locked <0x00000007944da478> (a java.util.concurrent.atomic.AtomicReference) org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x00000007944da9b0> (a java.util.concurrent.atomic.AtomicReference) org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x00000007944dae38> (a java.util.concurrent.atomic.AtomicReference) org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x0000000796d5d030> (a java.util.concurrent.atomic.AtomicReference) org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- waiting to lock <0x000000079624ff08> (a org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator) org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:330)
"FelixStartLevel"
- locked <0x000000079624ff08> (a org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator) org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:324)
- locked <0x0000000796959bc8> (a java.util.concurrent.atomic.AtomicReference) org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- locked <0x0000000796959eb8> (a java.util.concurrent.atomic.AtomicReference) org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- locked <0x000000079695a188> (a java.util.concurrent.atomic.AtomicReference) org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- waiting <0x000000079415eca0> (a [Ljava.lang.Object org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:5019)
[1] This has been confirmed via the value for m_globalLockThread of Felix instance in Heap Dump
[2] https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/ResourceResolverFactoryActivator.java#L313
[3] http://njbartlett.name/files/osgibook_preview_20091217.pdf (Section 6.4 Don’t Hold Locks when Calling Foreign Code)