Bug 46264 - Shutting down tomcat with large number of contexts is slow
Summary: Shutting down tomcat with large number of contexts is slow
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 6
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 6.0.18
Hardware: PC Linux
: P2 enhancement with 1 vote (vote)
Target Milestone: default
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-21 08:52 UTC by Joe Kislo
Modified: 2012-02-13 20:41 UTC (History)
0 users



Attachments
Proposed patch (5.08 KB, patch)
2008-11-21 09:04 UTC, Joe Kislo
Details | Diff
starting contexts in parallel using an executor (12.94 KB, patch)
2011-10-10 17:42 UTC, Felix Schumacher
Details | Diff
Threaded start, stop and deployment fo Contexts (17.65 KB, patch)
2011-10-11 13:39 UTC, Mark Thomas
Details | Diff
starting contexts in parallel using an executor (18.71 KB, patch)
2011-10-11 13:53 UTC, Felix Schumacher
Details | Diff
make ContextConfig threadsafe (1.08 KB, patch)
2011-10-11 14:49 UTC, Felix Schumacher
Details | Diff
Threaded start, stop and deployment for Contexts (33.42 KB, patch)
2011-10-11 17:12 UTC, Mark Thomas
Details | Diff
Threaded start, stop and deployment for Contexts (49.33 KB, patch)
2011-10-13 10:37 UTC, Mark Thomas
Details | Diff
Threaded start, stop and deployment for Contexts (47.75 KB, patch)
2011-10-13 12:46 UTC, Mark Thomas
Details | Diff
Threaded start, stop and deployment for Contexts (999.07 KB, patch)
2011-10-13 23:19 UTC, Mark Thomas
Details | Diff
Threaded start, stop and deployment for Contexts (58.92 KB, patch)
2011-10-25 17:27 UTC, Mark Thomas
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Kislo 2008-11-21 08:52:10 UTC
Shutting down tomcat with large number of contexts is slow

On some of our sandbox testing environments, we have tomcat loaded with 30-40 contexts, and run a very large heap (2-3GB).  Most of these contexts are large applications which take anywhere from 5-10 seconds to shutdown each.  Most of the time spent shutting down each application is not spent using the local app server CPU, but shutting down remote resources (Announcing the application is going down over JMS, flushing write buffers, closing DB connections, closing JMS connections, closing log connections, etc.).  Shutting down tomcat typically takes minutes, because it shuts down one context at a time.
Comment 1 Joe Kislo 2008-11-21 09:04:01 UTC
I propose that during the context shutdown, tomcat use multiple threads to shutdown the contexts in parallel.

I have attached a patch which will fire off:

2 x # of Cores

Threads during the shutdown process and work in a FIFO manner shutting down the contexts in parallel.  I suspect even when shutting down contexts which are entirely local in nature (not using resources on remote systems) on a single CPU machine a parallel shutdown will increase the shutdown speed because of various waits, sleeps and other things that may be called in the destroy() method of servlets that can be done in parallel

Comment 2 Joe Kislo 2008-11-21 09:04:50 UTC
Created attachment 22912 [details]
Proposed patch
Comment 3 Joe Kislo 2010-11-29 15:14:59 UTC
I can confirm my patch still works on Tomcat 6.0.29
Comment 4 Pid 2010-11-30 01:41:00 UTC
(In reply to comment #3)
> I can confirm my patch still works on Tomcat 6.0.29

Would the java.util.concurrency package not provide a more elegant way of solving this problem?
Comment 5 Mark Thomas 2011-10-09 16:31:14 UTC
A few comments on the patch.

1. Consider allowing the number of threads to be used to be configured (probably as a attribute of the host).

2. Webapp start/stop time can vary widely. A more efficient solution would be to put all the webapps in a queue and have worker threads remove them one at a time.

3. Both start and stop needs to be addressed.
Comment 6 Felix Schumacher 2011-10-10 17:42:06 UTC
Created attachment 27755 [details]
starting contexts in parallel using an executor

While this patch is not really for stopping context, but for starting them in parallel, it might be useful nontheless.

There are two different ways to configure the amount of threads for parallel deployment. First extend the Host element in server.xml with the new attribute parallelDeployment. A value greater 0 will be used. If no valid value was given that way, the system property hostConfig.parallelDeploymentCount will be tested. Again a value greater zero will be valid. If still no valid value could be found Runtime will be asked for the number of available processors.
Comment 7 Rainer Jung 2011-10-10 18:05:29 UTC
Hi Felix,

don't want to split hairs or paint bikesheds but the attribute name seems problematic: in TC 7 we call "parallel deployment" the possibility to serve multiple versions of the same context in parallel by deploying versioned contexts.

Something like startupConcurrency might be better (and I think the fact that it would also be used for shutdown is not a big deal).

Regards,

Rainer
Comment 8 Mark Thomas 2011-10-11 11:02:10 UTC
I'm currently working on combining these two patches into a complete solution that covers multi-threaded deployment, and container start and stop.
Comment 9 Mark Thomas 2011-10-11 13:39:57 UTC
Created attachment 27758 [details]
Threaded start, stop and deployment fo Contexts

This proposed patch (against trunk) provides threaded start/stop for Contexts and Hosts and threaded deployment for Contexts. It builds on the previous suggested patches and the discussion on the users mailing list.

The patch is provided for review and feedback. It will be amended or committed based on the feedback received.
Comment 10 Konstantin Kolinko 2011-10-11 13:48:19 UTC
(In reply to comment #9)

+        // Zero == Runtime.getRuntime().availableProcessors()
+        // -ve  == Runtime.getRuntime().availableProcessors() - value
+        // These two are the same
+        result = Runtime.getRuntime().availableProcessors() - result;

result is negative, so it gets more threads than processors?

You would want "+ result" here and "+ value" in the comment and in docs.
Comment 11 Felix Schumacher 2011-10-11 13:53:26 UTC
Created attachment 27759 [details]
starting contexts in parallel using an executor

In my testings, I have found that my patch sometimes throws exception deep inside tomcat. Those seem to come from incorrect locking of the digester in ContextConfig. I have corrected the initialization, so that findbugs is happy.

But while testing while writing this I got:
java.lang.NullPointerException
        at org.apache.tomcat.util.digester.Digester.startElement(Digester.java:1231)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:501)
        at com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:179)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1343)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2755)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
        at org.apache.tomcat.util.digester.Digester.parse(Digester.java:1537)
        at org.apache.catalina.startup.ContextConfig.processContextConfig(ContextConfig.java:650)
        at org.apache.catalina.startup.ContextConfig.contextConfig(ContextConfig.java:607)
        at org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:845)
        at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:340)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
        at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
        at org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:389)
        at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:110)
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:139)
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:812)
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:787)
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:655)
        at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:653)
        at org.apache.catalina.startup.HostConfig$1.run(HostConfig.java:563)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

after that the digester seems to be unusable and I get following:
org.xml.sax.SAXException: FWK005 parse may not be called while parsing.
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1245)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
        at org.apache.tomcat.util.digester.Digester.parse(Digester.java:1537)
        at org.apache.catalina.startup.ContextConfig.processContextConfig(ContextConfig.java:650)
        at org.apache.catalina.startup.ContextConfig.contextConfig(ContextConfig.java:587)
        at org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:845)
        at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:340)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
        at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
        at org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:389)
        at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:110)
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:139)
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:812)
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:787)
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:655)
        at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:653)
        at org.apache.catalina.startup.HostConfig$1.run(HostConfig.java:563)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

After that I changed class variable digester in ContextConfig to an instance variable.

I also implemented a simple "same thread executor" to be used when only one thread is configured.

I renamed parallelDeployment to startupConcurrency as suggested by Rainer and extended Host to expose getters and setters.
Comment 12 Felix Schumacher 2011-10-11 14:49:21 UTC
Created attachment 27760 [details]
make ContextConfig threadsafe

Since Mark's patch is more elegant and complete than mine, it makes mine obsolete. But it has the same problem with the missing threadsafety of ContextConfig.

The attached patch makes Digester a member variable and the initialization more correct. With that I haven't seen an exception (yet).
Comment 13 Mark Thomas 2011-10-11 15:17:17 UTC
I've fixed the +/- issue locally and will include that in the next version of the patch. Thanks Konstantin for the catch.

Felix, I think you have found one of the places where it is assumed context's are processed serially. I'm pretty sure there will be others or places where multiple threads don't help because of syncs (e.g. processing web.xml) I'll take a look through the Context init code and see what I can find. I'll include any fixes in the next version of the patch.
Comment 14 Mark Thomas 2011-10-11 17:12:24 UTC
Created attachment 27761 [details]
Threaded start, stop and deployment for Contexts

Updated patch that:
- fixes the issues identified by Konstantin
- includes a variation of Felix's patch for context.xml parsing
- fixes a similar issue with web.xml parsing
- fixes an issue that meant the host's executor spun up threads every time it checked for new apps to deploy

With this patch applied, I see around 30% improvement in start time for 10s of small, simple applications. This is better, but not the improvement I was hoping for with 4 threads on an 8-core machine. The bulk of the time appears to be spent in XML parsing.

I'm continuing to look into to this to see if there is scope for further improvement. Suggestions and/or additional analysis welcome.

Note: The overhead of creating a digester per app is noticeable at ~5% so I think it is a price worth paying.
Comment 15 Pid 2011-10-11 18:05:57 UTC
There is a new Digester release (3.0), but I have no idea whether it would significantly improve speed and I suspect it's incompatible so could require non-trivial modifications elsewhere.
Comment 16 Konstantin Kolinko 2011-10-11 20:32:53 UTC
(In reply to comment #15)
> There is a new Digester release (3.0), but I have no idea whether it would
> significantly improve speed

From threading point of view it is still the same: Rule, Digester and xml parser are usable in a single thread at once only. I do not think that xml parsers can be multi-threaded.

There is new API to declare a factory that creates sets of rules (binder.RulesModule), but we already do something similar, e.g. WebRuleSet#addRuleInstances().

Digester 3.0 release notes:
http://commons.apache.org/digester/commons-digester-3.0/RELEASE-NOTES.txt
Comment 17 Felix Schumacher 2011-10-12 08:45:45 UTC
The HashMap HostConfig#deployed is used by multiple threads, but is not synchronized.
So there could be problems, even if I haven't seen any yet.

We could either wrap it using Collections.synchronizedMap, change it to a real concurrent Map implementation or use the Future, we get from the executorService to manipulate it again in a single thread.
Comment 18 Mark Thomas 2011-10-13 10:37:08 UTC
Created attachment 27767 [details]
Threaded start, stop and deployment for Contexts

Updated patch that:
- fixes the concurrency issue with the map of deployed applications
- completes the remaining TODOs in the patch
- removes the use of threads to start/stop listeners etc to prevent memory leaks as that is no longer required if all start/stop is done on a separate, short-lived thread

I think this patch is getting pretty close now. Feedback from users with large numbers of apps would be useful.
Comment 19 Konstantin Kolinko 2011-10-13 11:32:13 UTC
(In reply to comment #18)
> Created attachment 27767 [details]

Re: startStopExecutor.allowCoreThreadTimeOut(true);

I think that just using "0" instead of getStartStopThreadsInternal() as the value of first argument (corePoolSize) in ThreadPoolExecutor constructor will have the same effect. It is not much of a difference though.

Re: Iterator<Future<Void>> iter = results.iterator();

It could be rewritten as for(Future<Void> future: results) loop.
In one place Future<?> is used, while I think it could be Future<Void> like in other places.

Re: HostConfig

I do not quite understand why to remove
"if (deploymentExists(cn.getName())) { return; }"
from the beginning of e.g. deployDescriptor() method.

The HostConfig#deployApps() method is called every 10 seconds to perform autodeployment (by HostConfig#check() called by HostConfig#lifecycleEvent())
and without early return it will proceed to parsing context.xml file.

Renaming s/dir/war/ can be done now, to slightly reduce future patch.
Comment 20 Konstantin Kolinko 2011-10-13 11:40:13 UTC
> (In reply to comment #18)
> > Created attachment 27767 [details]

Re: HostConfig, one more:

-hostConfig.deployWar=Deploying web application archive {0}

The above message should not have been removed from LocalStrings.properties file. It is used.

Re: docs/config/host.xml, engine.xml:

Maybe move the phrase about the default value to the end of the description.
Comment 21 Mark Thomas 2011-10-13 12:46:23 UTC
Created attachment 27769 [details]
Threaded start, stop and deployment for Contexts

Updated patch that addresses review comments so far.
Comment 22 Felix Schumacher 2011-10-13 13:42:01 UTC
In ContainerBase#initInternal the ThreadPoolExecutor gets initialized with a core pool size of "0", but if we call ContainerBase#setStartStopThreads core pool size gets set to maximum pool size. Is this intended, or have I misinterpreted the code?
Comment 23 Felix Schumacher 2011-10-13 14:41:42 UTC
With core pool size set to "0" in ContainerBase#initInternal I get no concurrency at startup. It will be sequential only. If I change it back to 

  startStopExecutor = new ThreadPoolExecutor(getStartStopThreadsInternal(),
                getStartStopThreadsInternal(), 10, TimeUnit.SECONDS,
                startStopQueue);

I get a concurrent startup. (Startup time for my 20 dummy applications go down from 16s to 9s)


As ContainerBase#initInternal is also called from StandardContext, each Context will get its own startStopExecutor. Is this really needed?
Comment 24 Mark Thomas 2011-10-13 15:01:23 UTC
(In reply to comment #23)
> With core pool size set to "0" in ContainerBase#initInternal I get no
> concurrency at startup. It will be sequential only. If I change it back to 
> 
>   startStopExecutor = new ThreadPoolExecutor(getStartStopThreadsInternal(),
>                 getStartStopThreadsInternal(), 10, TimeUnit.SECONDS,
>                 startStopQueue);
> 
> I get a concurrent startup. (Startup time for my 20 dummy applications go down
> from 16s to 9s)

I'll take another look at that.

> As ContainerBase#initInternal is also called from StandardContext, each Context
> will get its own startStopExecutor. Is this really needed?

It isn't used at the moment and is likely to stay that way.
Comment 25 Mark Thomas 2011-10-13 23:04:51 UTC
(In reply to comment #19)
> Re: startStopExecutor.allowCoreThreadTimeOut(true);
> 
> I think that just using "0" instead of getStartStopThreadsInternal() as the
> value of first argument (corePoolSize) in ThreadPoolExecutor constructor will
> have the same effect. It is not much of a difference though.

This doesn't work since the queue is unbounded no more than one thread is every created.
Comment 26 Mark Thomas 2011-10-13 23:19:14 UTC
Created attachment 27772 [details]
Threaded start, stop and deployment for Contexts

Updated version of the patch that restores the ability to start contexts in parallel.
The overhead (with the TCK webapps) of using a single thread is roughly what we have gained caching the global web.xml so users should see no change in the default config.
On an 8-core machine (and with the TCK webapps) I see a 50% reduction in start time when I use 4 threads.
Comment 27 Mark Thomas 2011-10-25 17:27:35 UTC
Created attachment 27846 [details]
Threaded start, stop and deployment for Contexts

Updated patch without the line-ending issue of the previous one. I intend to apply this in the next day or so.
Comment 28 Mark Thomas 2011-10-28 08:02:48 UTC
This has been implemented in trunk and 7.0.x and will be included in 7.0.23 onwards.
Comment 29 Guido Leenders 2011-12-04 23:00:39 UTC
Some experience figures for 24 contents of which 12 are heavy applications:

* original startup time with 7.0.21: 280 seconds
* with  startStopThreads="16": 30 seconds

Thank you!
Comment 30 Leslie 2012-02-13 07:31:36 UTC
Where do we set the startStopThreads parameter value?
Comment 31 Christopher Schultz 2012-02-13 20:24:39 UTC
(In reply to comment #30)
> Where do we set the startStopThreads parameter value?

In the <Engine> component:
http://tomcat.apache.org/tomcat-7.0-doc/config/engine.html

Please use the users' list for questions in the future.
Comment 32 Chuck Caldarale 2012-02-13 20:41:25 UTC
(In reply to comment #31)
> (In reply to comment #30)
> > Where do we set the startStopThreads parameter value?
> 
> In the <Engine> component:
> http://tomcat.apache.org/tomcat-7.0-doc/config/engine.html

Also <Host>:
http://tomcat.apache.org/tomcat-7.0-doc/config/host.html

> Please use the users' list for questions in the future.

Absolutely.