Uploaded image for project: 'Felix'
  1. Felix
  2. FELIX-836

Deadlocks caused by Declarative Services

    XMLWordPrintableJSON

Details

    Description

      We experience deadlocks with Declarative Services every now and then related to the interaction between the framework and SCR. The situation most often happens when multiple bundles are uptated and packages are refreshed.

      One thread involved is the "SCR Component Actor" thread, which is the thread of SCR running asynchronous operations on registered components. The other thread is the framework PackageAdmin thread which stops and starts bundles during refresh:

      ---- thread dump excerpt ----
      "SCR Component Actor" daemon prio=5 tid=0x0107f9c0 nid=0xab2400 in Object.wait() [0xb18a2000..0xb18a2d90]
      at java.lang.Object.wait(Native Method)

      • waiting on <0x075e2d58> (a [Ljava.lang.Object
        at java.lang.Object.wait(Object.java:474)
        at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4167)
      • locked <0x075e2d58> (a [Ljava.lang.Object
        at org.apache.felix.framework.Felix.registerService(Felix.java:2665)
        at org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:254)
        at org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:232)
        at org.apache.sling.scripting.core.impl.ScriptEngineConsolePlugin.activate(ScriptEngineConsolePlugin.java:171)
        at org.apache.sling.scripting.core.impl.ScriptEngineConsolePlugin.initPlugin(ScriptEngineConsolePlugin.java:52)
        at org.apache.sling.scripting.core.impl.SlingScriptAdapterFactory.activate(SlingScriptAdapterFactory.java:223)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.felix.scr.impl.ImmediateComponentManager.createImplementationObject(ImmediateComponentManager.java:226)
        at org.apache.felix.scr.impl.ImmediateComponentManager.createComponent(ImmediateComponentManager.java:133)
        at org.apache.felix.scr.impl.AbstractComponentManager.activateInternal(AbstractComponentManager.java:476)
      • locked <0x0976ca30> (a org.apache.felix.scr.impl.ImmediateComponentManager)
        at org.apache.felix.scr.impl.AbstractComponentManager.enableInternal(AbstractComponentManager.java:398)
        at org.apache.felix.scr.impl.AbstractComponentManager.access$000(AbstractComponentManager.java:36)
        at org.apache.felix.scr.impl.AbstractComponentManager$1.run(AbstractComponentManager.java:99)
        at org.apache.felix.scr.impl.ComponentActorThread.run(ComponentActorThread.java:85)

      "FelixPackageAdmin" daemon prio=5 tid=0x0106f330 nid=0xaad000 waiting for monitor entry [0xb171f000..0xb171fd90]
      at org.apache.felix.scr.impl.AbstractComponentManager.deactivateInternal(AbstractComponentManager.java:540)

      • waiting to lock <0x0976ca30> (a org.apache.felix.scr.impl.ImmediateComponentManager)
        at org.apache.felix.scr.impl.AbstractComponentManager.disableInternal(AbstractComponentManager.java:579)
        at org.apache.felix.scr.impl.AbstractComponentManager.disposeInternal(AbstractComponentManager.java:616)
        at org.apache.felix.scr.impl.AbstractComponentManager.dispose(AbstractComponentManager.java:272)
        at org.apache.felix.scr.impl.ImmediateComponentManager.dispose(ImmediateComponentManager.java:120)
        at org.apache.felix.scr.impl.BundleComponentActivator.dispose(BundleComponentActivator.java:258)
        at org.apache.felix.scr.impl.Activator.disposeComponents(Activator.java:264)
        at org.apache.felix.scr.impl.Activator.bundleChanged(Activator.java:177)
        at org.apache.felix.framework.util.EventDispatcher.invokeBundleListenerCallback(EventDispatcher.java:690)
        at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:619)
        at org.apache.felix.framework.util.EventDispatcher.fireBundleEvent(EventDispatcher.java:532)
        at org.apache.felix.framework.Felix.fireBundleEvent(Felix.java:3601)
        at org.apache.felix.framework.Felix._stopBundle(Felix.java:1989)
        at org.apache.felix.framework.Felix.stopBundle(Felix.java:1954)
        at org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:3972)
        at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3257)
        at org.apache.felix.framework.PackageAdminImpl.run(PackageAdminImpl.java:259)
        at java.lang.Thread.run(Thread.java:613)
            • thread dump excerpt ----

      The problem is that on the one hand we need some kind of synchronization inside SCR and on the other hand SCR has a synchronous bundle listener. This is used to hear about started bundles and bundles about to stop (which is the case in the PackageAdmin thread dump above). Since synchronous listeners are called synchronously while the framework is trying to change the bundle state, the framework bundle locks are held while the listener is called.

      To prevent this kind of deadlock the bundle listener should probably dispose off the components asynchronously while the framework is stopping the bundles.

      Looking at when components are being disabled, we find four situations:

      (1) A bundle is stopped and its components must be disposed off. This may be done in the SCR thread asynchronous to the actual bundle stop processing.
      (2) A factory configuration object of a ComponentFactory instance is deleted. This may also be done asynchronously because IMHO it is not time critical to dispose off an instance of a ComponentFactory in this situation.
      (3) A ComponentInstance explicitly disposed of by calling the ComponentInstance.dispose() method. This request must also be acted upon immediately and the component disposed off synchronously.
      (4) The Declarative Services bundle is stopped. Here we have to immediately dispose off all components in the system. We cannot do this asynchronously.

      Attachments

        1. FELIX-836.patch
          8 kB
          Felix Meschberger

        Issue Links

          Activity

            People

              fmeschbe Felix Meschberger
              fmeschbe Felix Meschberger
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: