Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.14.0
-
None
-
Patch Available
-
Unknown
Description
Camel sometimes hangs (100% usage of 1 CPU) during shutdown. This occurs intermittently during integration testing. We use recipientList to perform some dynamic routing and many messages can be processed in parallel threads by recipientList and there are a few of them in different routes (this will be important in a moment).
Stack trace of stuck thread:
"SpringApplicationShutdownHook" #18 prio=5 os_prio=0 cpu=1784989.37ms elapsed=1790.21s tid=0x00007f7a13520000 nid=0xa80 runnable [0x00007f79a3af5000] java.lang.Thread.State: RUNNABLE at org.apache.camel.support.cache.ServicePool.stop(ServicePool.java:193) at org.apache.camel.support.cache.ServicePool$$Lambda$2274/0x000000084108fc40.accept(Unknown Source) at java.util.LinkedHashMap$LinkedValues.forEach(java.base@11.0.13/LinkedHashMap.java:608) at org.apache.camel.support.cache.ServicePool.doStop(ServicePool.java:181) at org.apache.camel.support.service.BaseService.stop(BaseService.java:160) - locked <0x00000000e3ece780> (a java.lang.Object) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162) at org.apache.camel.support.cache.DefaultProducerCache.doStop(DefaultProducerCache.java:399) at org.apache.camel.support.service.BaseService.stop(BaseService.java:160) - locked <0x00000000e3ece740> (a java.lang.Object) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147) at org.apache.camel.processor.RecipientList.doStop(RecipientList.java:222) at org.apache.camel.support.service.BaseService.stop(BaseService.java:160) - locked <0x00000000e3ecdb00> (a java.lang.Object) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:165) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147) at org.apache.camel.processor.Pipeline.doStop(Pipeline.java:226) at org.apache.camel.support.service.BaseService.stop(BaseService.java:160) - locked <0x00000000e3ecda48> (a java.lang.Object) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:147) at org.apache.camel.impl.engine.DefaultChannel.doStop(DefaultChannel.java:134) at org.apache.camel.support.service.BaseService.stop(BaseService.java:160) - locked <0x00000000e3eccde0> (a java.lang.Object) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162) at org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:257) at org.apache.camel.support.service.ServiceHelper.stopAndShutdownServices(ServiceHelper.java:215) at org.apache.camel.processor.errorhandler.RedeliveryErrorHandler.doShutdown(RedeliveryErrorHandler.java:1665) at org.apache.camel.support.ChildServiceSupport.shutdown(ChildServiceSupport.java:113) - locked <0x00000000e3ecc1e8> (a java.lang.Object) at org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:233) at org.apache.camel.impl.engine.RouteService.stopChildServices(RouteService.java:409) at org.apache.camel.impl.engine.RouteService.doStop(RouteService.java:260) at org.apache.camel.support.ChildServiceSupport.stop(ChildServiceSupport.java:86) - locked <0x00000000e40873a0> (a java.lang.Object) at org.apache.camel.support.service.ServiceHelper.stopService(ServiceHelper.java:162) at org.apache.camel.support.service.ServiceHelper.stopAndShutdownService(ServiceHelper.java:227) at org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3559) at org.apache.camel.impl.engine.AbstractCamelContext.shutdownServices(AbstractCamelContext.java:3584) at org.apache.camel.impl.engine.AbstractCamelContext.doStop(AbstractCamelContext.java:3366) at org.apache.camel.spring.boot.SpringBootCamelContext.doStop(SpringBootCamelContext.java:61) - locked <0x00000000e221e658> (a org.apache.camel.spring.boot.SpringBootCamelContext) at org.apache.camel.support.service.BaseService.stop(BaseService.java:160) - locked <0x00000000e221e8a8> (a java.lang.Object) at org.apache.camel.impl.engine.AbstractCamelContext.stop(AbstractCamelContext.java:2640) at org.apache.camel.spring.SpringCamelContext.stop(SpringCamelContext.java:128) at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:247) at org.springframework.context.support.DefaultLifecycleProcessor.access$300(DefaultLifecycleProcessor.java:54) at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:373) at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:206) at org.springframework.context.support.DefaultLifecycleProcessor.onClose(DefaultLifecycleProcessor.java:129) at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1067) at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1021) - locked <0x00000000e11e3868> (a java.lang.Object) at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:137) at org.springframework.boot.SpringApplicationShutdownHook$$Lambda$2265/0x0000000841088440.accept(Unknown Source) at java.lang.Iterable.forEach(java.base@11.0.13/Iterable.java:75) at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:106) at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)
Deeper inspection of heap dump reveals that linked list in ServicePool.cache is corrupted and has a loop, which causes foreach to loop infinitely (see attached screenshot from heapdump analysis).
Access to ServicePool.cache is not synchronized even though LinkedHashMap is not thread-safe. There is at least one code path that can add to the cache it concurrently (and indeed in this seems to happen in our tests):
- org.apache.camel.processor.MulticastProcessor.process(Exchange, AsyncCallback)
- org.apache.camel.processor.RecipientListProcessor.createProcessorExchangePairs(Exchange)
- org.apache.camel.processor.RecipientListProcessor.doCreateProcessorExchangePairs(Exchange, Object, List<ProcessorExchangePair>, int)
- org.apache.camel.support.cache.DefaultProducerCache.acquireProducer(Endpoint)
- org.apache.camel.support.cache.ServicePool.acquire(Endpoint)
- org.apache.camel.support.cache.ServicePool.cache
There are also a few unsynchronized invocations of DefaultProducerCache.acquireProducer from other processors.
I am not sure if other usages of cache need to be synchronized, maybe Camel lifecycle provides enough serialization.
The simpliest solution is to wrap cache in Collections.synchronizedMap but this might have negative impact on performance.
Attachments
Attachments
Issue Links
- is related to
-
CAMEL-16093 camel-caffeine-lrucache - Deprecate and move out of core
- Resolved
- relates to
-
CAMEL-17544 ServicePool.doStop still hangs during shutdown
- Resolved
- links to