Uploaded image for project: 'Karaf'
  1. Karaf
  2. KARAF-2256

Deadlock when refreshing bundles

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 2.3.1
    • None
    • None
    • None
    • 64-bit Linux Oracle JDK 1.7.0_17

    Description

      When attempting to install the DOSGi feature (by running "features:chooseurl cxf-dosgi 1.4.0" and "features:install cxf-dosgi-discovery-distributed"), the installation hangs along with some of the bundles which can no longer be started, stopped, checked for imports, etc. - the Karaf server must be killed and restarted to resume. This is likely not related to this specific feature, and can happen with other refreshed bundles and installed features as well.

      At a glance it seems like this is caused by the "OPS4J Pax Web - Runtime (1.1.12)" bundle being stuck in the stopping state due to a deadlock caused by its Activator:

      It receives a removedService notification from a service tracker, which is handled in a separate thread using a custom executor and eventually tries to resolve some bundle and ends up waiting for acquireGlobalLock indefinitely.

      This is because at the same time, Felix calls refreshPackages which attempts to stop the bundle (while holding the lock), whose activator puts a cleanup task in its custom executor and then attempts to shut down the executor. This never happens, because the previous executor task initiated from removeService is waiting for the lock, hence the deadlock.

      I'm not entirely sure which of the projects has the underlying bug in it - probably pax web, possibly Felix if the OSGi specs allow for the behavior that hangs it, but in any case Karaf is using these versions and exhibiting the deadlock, so at the very least should upgrade to fixed versions of these libraries, or patch them.

      If anyone who knows these systems better thinks it should be reported in one of the upstream projects, point me in the right direction and I'll be happy to do it.

      Here is the thread dump, the top two threads show the deadlock, and the other two are bundles which are stuck as well due to waiting for the same lock (I think).

      "FelixFrameworkWiring" daemon prio=10 tid=0x00007f390002e000 nid=0x35d1 in Object.wait() [0x00007f3948dd3000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000000f1b8ea50> (a java.lang.Object)
        at java.lang.Object.wait(Object.java:503)
        at org.ops4j.pax.web.service.internal.Executor.shutdown(Executor.java:91)
      • locked <0x00000000f1b8ea50> (a java.lang.Object)
        at org.ops4j.pax.web.service.internal.Activator.stop(Activator.java:140)
        at org.apache.felix.framework.util.SecureAction.stopActivator(SecureAction.java:667)
        at org.apache.felix.framework.Felix.stopBundle(Felix.java:2361)
        at org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629)
        at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951)
        at org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
        at java.lang.Thread.run(Thread.java:722)

      "Pax Web Runtime worker" daemon prio=10 tid=0x00007f3904263000 nid=0x370a in Object.wait() [0x00007f390dfa8000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000000e990e018> (a [Ljava.lang.Object
        at java.lang.Object.wait(Object.java:503)
        at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4944)
      • locked <0x00000000e990e018> (a [Ljava.lang.Object
        at org.apache.felix.framework.StatefulResolver.resolve(StatefulResolver.java:219)
        at org.apache.felix.framework.BundleWiringImpl.searchDynamicImports(BundleWiringImpl.java:1539)
        at org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1439)
        at org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:72)
        at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1843)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        at org.apache.felix.framework.BundleWiringImpl.getClassByDelegation(BundleWiringImpl.java:1317)
        at org.apache.felix.framework.ServiceRegistrationImpl$ServiceReferenceImpl.isAssignableTo(ServiceRegistrationImpl.java:521)
        at org.apache.felix.framework.util.Util.isServiceAssignable(Util.java:280)
        at org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:916)
        at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
        at org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
        at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
        at org.apache.felix.framework.Felix.access$000(Felix.java:74)
        at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
        at org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
        at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
        at org.ops4j.pax.web.service.internal.Activator.updateController(Activator.java:231)
        at org.ops4j.pax.web.service.internal.Activator$DynamicsServiceTrackerCustomizer$2.run(Activator.java:387)
        at org.ops4j.pax.web.service.internal.Executor$Future.run(Executor.java:45)
        at org.ops4j.pax.web.service.internal.Executor$Worker.run(Executor.java:122)

      "fileinstall-/opt/apache-karaf-2.3.1/deploy" daemon prio=10 tid=0x00007f3904018800 nid=0x35a8 in Object.wait() [0x00007f394aba8000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000000e990e018> (a [Ljava.lang.Object
        at java.lang.Object.wait(Object.java:503)
        at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
      • locked <0x00000000e990e018> (a [Ljava.lang.Object
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
        at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1247)
        at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1219)
        at org.apache.felix.fileinstall.internal.DirectoryWatcher.startAllBundles(DirectoryWatcher.java:1208)
        at org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:503)
        at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:291)

      "NioProcessor-2" prio=10 tid=0x00007f3914014000 nid=0x35fd in Object.wait() [0x00007f394a064000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000000e990e018> (a [Ljava.lang.Object
        at java.lang.Object.wait(Object.java:503)
        at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
      • locked <0x00000000e990e018> (a [Ljava.lang.Object
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:931)
        at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:479)
        at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:396)
        at org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:392)
        at org.apache.karaf.features.command.InstallFeatureCommand.doExecute(InstallFeatureCommand.java:62)
        at org.apache.karaf.features.command.FeaturesCommandSupport.doExecute(FeaturesCommandSupport.java:41)
        at org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38)
        at org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35)
        at org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78)
        at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:474)
        at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:400)
        at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)
        at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)
        at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120)
        at org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89)
        at org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand$1.run(ShellCommandFactory.java:109)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand.start(ShellCommandFactory.java:107)
        at org.apache.sshd.server.channel.ChannelSession.handleExec(ChannelSession.java:388)
        at org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:235)
        at org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:195)
        at org.apache.sshd.common.session.AbstractSession.channelRequest(AbstractSession.java:1057)
        at org.apache.sshd.server.session.ServerSession.running(ServerSession.java:229)
        at org.apache.sshd.server.session.ServerSession.handleMessage(ServerSession.java:205)
        at org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:566)
        at org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:236)
      • locked <0x00000000efd56b00> (a java.lang.Object)
        at org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:58)
        at org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
        at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
        at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
        at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
        at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
        at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
        at org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

      Attachments

        Activity

          People

            achim_nierbeck Achim Nierbeck
            amichai Amichai Rothman
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: