Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-17536

ServicePool.doStop hangs during shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.14.0
    • 3.14.1, 3.15.0
    • camel-core
    • None
    • Patch Available
    • Unknown

    Description

      Camel sometimes hangs (100% usage of 1 CPU) during shutdown. This occurs intermittently during integration testing. We use recipientList to perform some dynamic routing and many messages can be processed in parallel threads by recipientList and there are a few of them in different routes (this will be important in a moment).

      Stack trace of stuck thread:

      "SpringApplicationShutdownHook" #18 prio=5 os_prio=0 cpu=1784989.37ms elapsed=1790.21s tid=0x00007f7a13520000 nid=0xa80 runnable  [0x00007f79a3af5000]
         java.lang.Thread.State: RUNNABLE
          at org.apache.camel.support.cache.ServicePool.stop(ServicePool.java:193)
          at org.apache.camel.support.cache.ServicePool$$Lambda$2274/0x000000084108fc40.accept(Unknown Source)
          at java.util.LinkedHashMap$LinkedValues.forEach(java.base@11.0.13/LinkedHashMap.java:608)
          at org.apache.camel.support.cache.ServicePool.doStop(ServicePool.java:181)
          at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
          - locked <0x00000000e3ece780> (a java.lang.Object)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
          at org.apache.camel.support.cache.DefaultProducerCache.doStop(DefaultProducerCache.java:399)
          at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
          - locked <0x00000000e3ece740> (a java.lang.Object)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
          at org.apache.camel.processor.RecipientList.doStop(RecipientList.java:222)
          at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
          - locked <0x00000000e3ecdb00> (a java.lang.Object)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:165)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
          at org.apache.camel.processor.Pipeline.doStop(Pipeline.java:226)
          at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
          - locked <0x00000000e3ecda48> (a java.lang.Object)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147)
          at org.apache.camel.impl.engine.DefaultChannel.doStop(DefaultChannel.java:134)
          at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
          - locked <0x00000000e3eccde0> (a java.lang.Object)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
          at org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:257)
          at org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:215)
          at org.apache.camel.processor.errorhandler.RedeliveryErrorHandler.doShutdown(RedeliveryErrorHandler.java:1665)
          at org.apache.camel.support.ChildServiceSupport.shutdown(ChildServiceSupport.java:113)
          - locked <0x00000000e3ecc1e8> (a java.lang.Object)
          at org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:233)
          at org.apache.camel.impl.engine.RouteService.stopChildServices(RouteService.java:409)
          at org.apache.camel.impl.engine.RouteService.doStop(RouteService.java:260)
          at org.apache.camel.support.ChildServiceSupport.stop(ChildServiceSupport.java:86)
          - locked <0x00000000e40873a0> (a java.lang.Object)
          at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162)
          at org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:227)
          at org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3559)
          at org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3584)
          at org.apache.camel.impl.engine.AbstractCamelContext.doStop(AbstractCamelContext.java:3366)
          at org.apache.camel.spring.boot.SpringBootCamelContext.doStop(SpringBootCamelContext.java:61)
          - locked <0x00000000e221e658> (a org.apache.camel.spring.boot.SpringBootCamelContext)
          at org.apache.camel.support.service.BaseService.stop(BaseService.java:160)
          - locked <0x00000000e221e8a8> (a java.lang.Object)
          at org.apache.camel.impl.engine.AbstractCamelContext.stop(AbstractCamelContext.java:2640)
          at org.apache.camel.spring.SpringCamelContext.stop(SpringCamelContext.java:128)
          at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:247)
          at org.springframework.context.support.DefaultLifecycleProcessor.access$300(DefaultLifecycleProcessor.java:54)
          at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:373)
          at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:206)
          at org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:129)
          at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1067)
          at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1021)
          - locked <0x00000000e11e3868> (a java.lang.Object)
          at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:137)
          at org.springframework.boot.SpringApplicationShutdownHook$$Lambda$2265/0x0000000841088440.accept(Unknown Source)
          at java.lang.Iterable.forEach(java.base@11.0.13/Iterable.java:75)
          at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:106)
          at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)

      Deeper inspection of heap dump reveals that linked list in ServicePool.cache is corrupted and has a loop, which causes foreach to loop infinitely (see attached screenshot from heapdump analysis).

      Access to ServicePool.cache is not synchronized even though LinkedHashMap is not thread-safe. There is at least one code path that can add to the cache it concurrently (and indeed in this seems to happen in our tests):

      1. org.apache.camel.processor.MulticastProcessor.process(Exchange, AsyncCallback)
      2. org.apache.camel.processor.RecipientListProcessor.createProcessorExchangePairs(Exchange)
      3. org.apache.camel.processor.RecipientListProcessor.doCreateProcessorExchangePairs(Exchange, Object, List<ProcessorExchangePair>, int)
      4. org.apache.camel.support.cache.DefaultProducerCache.acquireProducer(Endpoint)
      5. org.apache.camel.support.cache.ServicePool.acquire(Endpoint)
      6. org.apache.camel.support.cache.ServicePool.cache

      There are also a few unsynchronized invocations of DefaultProducerCache.acquireProducer from other processors.

      I am not sure if other usages of cache need to be synchronized, maybe Camel lifecycle provides enough serialization.
       
      The simpliest solution is to wrap cache in Collections.synchronizedMap but this might have negative impact on performance.

      Attachments

        1. linkedhashmap corrupted.png
          77 kB
          Krzysztof Jamróz

        Issue Links

          Activity

            People

              davsclaus Claus Ibsen
              krzysztofjamroz Krzysztof Jamróz
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: