Bug 49159 - Improve ThreadLocal memory leak clean-up
Summary: Improve ThreadLocal memory leak clean-up
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 7.0.6
Hardware: All All
: P2 enhancement with 1 vote (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 49668 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-04-20 13:33 UTC by Mark Thomas
Modified: 2019-01-25 04:07 UTC (History)
2 users (show)



Attachments
Patch for tomcat 7 to renew threads (30.08 KB, patch)
2010-05-06 17:15 UTC, Sylvain Laurent
Details | Diff
Alternative approach to renewing threads (4.79 KB, patch)
2010-09-23 10:49 UTC, Pid
Details | Diff
proposed patch to renew threads one by one with a delay (11.03 KB, patch)
2010-09-25 18:42 UTC, Sylvain Laurent
Details | Diff
Patch for tomcat 7 to renew threads one by one with a delay without requiring a fair queue (18.42 KB, patch)
2010-09-28 17:34 UTC, Sylvain Laurent
Details | Diff
tc7 renew threads one by one in a bounded time (18.62 KB, patch)
2010-09-30 18:05 UTC, Sylvain Laurent
Details | Diff
patch tc7 renew threads RC1 (42.32 KB, patch)
2010-10-09 17:14 UTC, Sylvain Laurent
Details | Diff
patch tc7 renew threads RC2 (43.72 KB, patch)
2010-10-11 16:50 UTC, Sylvain Laurent
Details | Diff
patch tc6 renewThreads 2010-10-11 (56.30 KB, patch)
2010-10-11 16:54 UTC, Sylvain Laurent
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Thomas 2010-04-20 13:33:43 UTC
Doing this in a thread-safe way means performing the clean-up in the thread where the ThreadLocal exists. A likely point is just before the Thread gets returned to the pool.
Comment 1 Sylvain Laurent 2010-04-26 16:40:49 UTC
I did experiment with cleaning the threadlocals from their owning thread by doing it in org.apache.tomcat.util.threads.ThreadPoolExecutor.afterExecute(Runnable, Throwable)

It does work (i.e. it improves leak protection for the http://wiki.apache.org/tomcat/MemoryLeakProtection#webappClassInstanceAsThreadLocalIndirectValue case ), but since it clear all thread locals after each request, it breaks some optimizations that are done by some frameworks, applications or even tomcat itself.

I'm experimenting with the approach of recreating the thread pool, it seems cleaner and more efficient.
Comment 2 Mark Thomas 2010-04-28 08:23:05 UTC
It shouldn't clean all thread locals after each request, it should clean all thread locals loaded by a web application.

If the cleaning breaks apps or frameworks then those apps/frameworks are broken since they are leaking memory and should be fixed. One option to reduce side-effects would be only to clean the thread locals if the WebAppClassloader is stopped. The downside is that memory leaks won't be fixed immediately.

The ThreadLocals created by Tomcat will be loaded by the common class loader. Those shouldn't be causing any memory leaks, even in the embedded case since Tomcat will stop all the threads it created.
Comment 3 Sylvain Laurent 2010-05-06 17:15:35 UTC
Created attachment 25411 [details]
Patch for tomcat 7 to renew threads

Here is finally my proposition of patch for tomcat 7 to recreate the ThreadPoolExecutor of each Executor when a Context is stopped.

I also removed the clearReferencesThreadLocals property on WebApp[Class]Loader since my patch makes it useless and I think this feature is too unsafe.

I'm working on another patch for tomcat 6. There's a little more work to do because of the old WorkerThread stuff that has been replace by ThreadPoolExecutor in trunk.
Comment 4 Arjen Knibbe 2010-07-30 04:02:32 UTC
*** Bug 49668 has been marked as a duplicate of this bug. ***
Comment 5 Pid 2010-09-23 10:49:43 UTC
Created attachment 26068 [details]
Alternative approach to renewing threads

Instead of completely renewing the entire thread pool, it's possible to do it incrementally, which might be less invasive, in the event of a large number of applications being updated at the same time.

This patch adds a TaskThread class and modifies the StandardContextValve. The goal is to terminate a class after it's used, when it's discovered to be older than the Context which it last serviced.

This approach wouldn't be as comprehensive as replacing the entire pool, but should cause the pool to refresh incrementally over time. Further enhancement could include self-cleaning of thread locals etc.
Comment 6 Sylvain Laurent 2010-09-23 16:51:38 UTC
Pid, I applied your patch but it does not work.
Actually when the thread calls join(0) on itself it just leads to a deadlock : the Thread will remain forever in the join method unless it is interrupted.

Furthermore your idea works only if a new context is started. If a context is stopped and no other context is started, Threads won't be renewed.
Comment 7 Pid 2010-09-23 17:55:09 UTC
(In reply to comment #6)
> Pid, I applied your patch but it does not work.
> Actually when the thread calls join(0) on itself it just leads to a deadlock :
> the Thread will remain forever in the join method unless it is interrupted.

It's experimental admittedly. I'll clean up my SVN repo & retry it, thought it was working...
If it won't work at all, then the idea is dead, but there must be a way to terminate an individual thread & remove it from the pool.

> Furthermore your idea works only if a new context is started. 

Or an existing one is restarted, the point is that it's a compromise.
The idea was to have a lower impact than restarting the whole thread pool.

> If a context is stopped and no other context is started, Threads won't be renewed.

If a context is stopped and no other context is started, how will the server work?
Comment 8 Sylvain Laurent 2010-09-25 18:42:53 UTC
Created attachment 26074 [details]
proposed patch to renew threads one by one with a delay

I explored your idea and managed to force threads to "kill themselves" after a context is stopped.
Here are the main points of my new patch :

- I save the last time a context is stopped in a static variable of StandardContext (this could be improved to record it somewhere at the Service or Engine level, in a instance variable)
- The "suicide" is done in ThreadPoolExecutor.afterExecute(), in case the thread was created before the last context was stopped.
- the rate at which threads are renewed is throttled to avoid renewing all threads at the same time (otherwise it would have been the same as my initial proposal to renew the whole pool).
- I had to make TaskQueue extend ArrayBlockingQueue instead of LinkedBlockingQueue because the latter does not propose to have a "fair" policy. Fairness is required so that every thread of the pool is given the opportunity to be killed.
- Unlike LinkedBlockingQueue, ArrayBlockingQueue requires a reasonable size (not Integer.MAX_VALUE). I set it to 100 by default (configurable with the maxQueueSize property at the Connector or Executor level)

I'm quite happy with this patch except the fairness requirement :
- The javadocs of ArrayBlockingQueue say that it may decrease performance, but I don't how much
- Since each thread of the pool is used in turn, we might lose some caching effects
- and it also prevents the size of the pool to be decreased in case of low activity : without fairness, some Thread could timeout waiting for an element from the queue while other threads could serve new requests. Eventually the pool could decide to decrease it size.

What do you think ? If we don't use a fair queue, we do not have a deterministic behavior and some thread might still trigger a memory leak...
Comment 9 Sylvain Laurent 2010-09-25 18:51:53 UTC
> What do you think ? If we don't use a fair queue, we do not have a
> deterministic behavior and some thread might still trigger a memory leak...

Unless we have minSpareThreads=0. This way even threads that are no longer used will eventually be stopped (the timeout is 60s by default, maxIdleTime in StandardThreadExecutor and hardcoded in AbstractEndpoint.createExecutor() )
Comment 10 Sylvain Laurent 2010-09-28 17:34:48 UTC
Created attachment 26097 [details]
Patch for tomcat 7 to renew threads one by one with a delay without requiring a fair queue

I worked a little more on my patch to avoid requiring a fair queue. I actually reused part of my very first patch (the Listener). Basically here is what is done :

- idle threads are stopped all at once, even core pool threads. This is done by ThreadLocalLeakPreventionListener
- like my previous patch, active threads are stopped one by one with a delay.

All in all, this avoids performance impacts under load since only idle threads will be stopped as a whole, and other threads will be renewed slowly.

This memory leak protection is not 100% deterministic since one could think of scenarios where the load decreases just after a leaking context is stopped, so that some (core pool) threads are no longer used but still present in the pool and thus triggering the memory leak...
But it's a compromise between performance impacts and the duration of the memory leak.

I'd be glad to have some feedback on this patch!
Comment 11 Sylvain Laurent 2010-09-30 18:05:48 UTC
Created attachment 26108 [details]
tc7 renew threads one by one in a bounded time

New patch with further improvements : now the behavior is more deterministic, with an upper bound to the time necessary to renew all the threads of the pool. The upper bound is something like N*max(threadKeepAliveTimeout, longestRequest + threadRenewalDelay). Where N is the number of threads in the pool and longestRequest is the maximum time a request takes to be processed (of course it depends on the application).
This is really a worst case scenario and it should be much quicker in usual cases, something closer to max(threadKeepAliveTimeout, longestRequest).

I still have to make the threadRenewalDelay configurable (hardcoded to 1s for now) and to propose some cleanups in WebAppClassLoader since this patch makes the unsafe-and-disabled-by-default ThreadLocal cleaning obsolete. (it's still interesting to introspect the ThreadLocals to detect potential leaks and warn the user, it will help to improve libraries and applications...).

Any volunteer for a review before I continue finalizing this patch ?
Comment 12 Sylvain Laurent 2010-10-09 17:14:35 UTC
Created attachment 26150 [details]
patch tc7 renew threads RC1

Here is my proposed "RC1" patch for tc7.

- new property threadRenewalDelay on StandardThreadExecutor (default is 1000ms)
- new property renewThreadsWhenStoppingContext on StandardContext (default is true)
- removed property clearReferencesThreadLocals on StandardContext
- Transformed the clearReferencesThreadLocals behavior of WebappClassLoader into a checkThreadLocalsForLeaks behavior
- updated documentation
- successfully performed some tests with JMeter

I'll try to propose an equivalent patch for tomcat 6 in a few days...
Comment 13 Sylvain Laurent 2010-10-11 16:50:39 UTC
Created attachment 26156 [details]
patch tc7 renew threads RC2

patch "RC2" for tomcat 7 : using ProtocolHandler.getExecutor() instead of testing each possible implementation (the method was not present on ProtocolHandler a couple of weeks ago).
Comment 14 Sylvain Laurent 2010-10-11 16:54:31 UTC
Created attachment 26157 [details]
patch tc6 renewThreads 2010-10-11

Patch for tomcat 6.

It is essentially the same behavior as for tomcat 7, excepted that it requires that (at least) one Executor be configured and the Connectors be configured to use that(these) Executor(s).
So, the memory leak protection is not enabled by default, unlike for tomcat 7 which always uses an Executor.
Comment 15 Sylvain Laurent 2010-11-25 16:25:32 UTC
Though my proposed patch renews threads in the pool, there are actually 2 other types of threads that can prevent an application from being garbage collected if it has ThreadLocal-related leaks : 
- the "main" thread (the one when starting up tomcat)
- the ContainerBackgroundProcessor thread
(- the shutdown hook thread too, but since the JVM is shutting down it does not really matter)
- in an embedded scenario, the thread of the "containing" application

Currently when a webapp is started or stopped, the lifecycle callbacks into the webapp (context listener, filters, servlet) are executed by one of those threads.

I'm working on a patch where I spawn a thread that runs just during those callbacks then shuts down, and so far it prevents such leaks.
Comment 16 Sylvain Laurent 2010-12-05 18:00:05 UTC
Applied the patch to renew threads of the pool on trunk, will be available for 7.0.6.
Comment 17 Sylvain Laurent 2010-12-09 17:16:44 UTC
rev 1044145 : committed some more enhancements in case a web application creates some ThreadLocals that it does not clean during its startup or shutdown phases (calls to Listeners, Filters and Servlets). A dedicated thread is used in these phases.

Will be available for 7.0.6
Comment 18 Sylvain Laurent 2011-04-04 17:37:14 UTC
marking issue as resolved as of 7.0.6.
Comment 19 Torsten Krah 2012-01-12 14:24:58 UTC
Does the tomcat6 patch getting applied to any 6.x version? Or is this one only there in 7.0.6+?
Comment 20 Konstantin Kolinko 2012-01-12 15:21:11 UTC
No. This enhancement is too intrusive to be backported.
Comment 21 Joker 2019-01-25 04:07:54 UTC
(In reply to Sylvain Laurent from comment #11)
> Created attachment 26108 [details]
> tc7 renew threads one by one in a bounded time
> 
> New patch with further improvements : now the behavior is more
> deterministic, with an upper bound to the time necessary to renew all the
> threads of the pool. The upper bound is something like
> N*max(threadKeepAliveTimeout, longestRequest + threadRenewalDelay). Where N
> is the number of threads in the pool and longestRequest is the maximum time
> a request takes to be processed (of course it depends on the application).
> This is really a worst case scenario and it should be much quicker in usual
> cases, something closer to max(threadKeepAliveTimeout, longestRequest).
> 
> I still have to make the threadRenewalDelay configurable (hardcoded to 1s
> for now) and to propose some cleanups in WebAppClassLoader since this patch
> makes the unsafe-and-disabled-by-default ThreadLocal cleaning obsolete.
> (it's still interesting to introspect the ThreadLocals to detect potential
> leaks and warn the user, it will help to improve libraries and
> applications...).
> 
> Any volunteer for a review before I continue finalizing this patch ?

now the behavior is more
> deterministic  so does the 'more' means 100%(In reply to Sylvain Laurent from comment #11)
> Created attachment 26108 [details]
> tc7 renew threads one by one in a bounded time
> 
> New patch with further improvements : now the behavior is more
> deterministic, with an upper bound to the time necessary to renew all the
> threads of the pool. The upper bound is something like
> N*max(threadKeepAliveTimeout, longestRequest + threadRenewalDelay). Where N
> is the number of threads in the pool and longestRequest is the maximum time
> a request takes to be processed (of course it depends on the application).
> This is really a worst case scenario and it should be much quicker in usual
> cases, something closer to max(threadKeepAliveTimeout, longestRequest).
> 
> I still have to make the threadRenewalDelay configurable (hardcoded to 1s
> for now) and to propose some cleanups in WebAppClassLoader since this patch
> makes the unsafe-and-disabled-by-default ThreadLocal cleaning obsolete.
> (it's still interesting to introspect the ThreadLocals to detect potential
> leaks and warn the user, it will help to improve libraries and
> applications...).
> 
> Any volunteer for a review before I continue finalizing this patch ?

(In reply to Sylvain Laurent from comment #11)
> Created attachment 26108 [details]
> tc7 renew threads one by one in a bounded time
> 
> New patch with further improvements : now the behavior is more
> deterministic, with an upper bound to the time necessary to renew all the
> threads of the pool. The upper bound is something like
> N*max(threadKeepAliveTimeout, longestRequest + threadRenewalDelay). Where N
> is the number of threads in the pool and longestRequest is the maximum time
> a request takes to be processed (of course it depends on the application).
> This is really a worst case scenario and it should be much quicker in usual
> cases, something closer to max(threadKeepAliveTimeout, longestRequest).
> 
> I still have to make the threadRenewalDelay configurable (hardcoded to 1s
> for now) and to propose some cleanups in WebAppClassLoader since this patch
> makes the unsafe-and-disabled-by-default ThreadLocal cleaning obsolete.
> (it's still interesting to introspect the ThreadLocals to detect potential
> leaks and warn the user, it will help to improve libraries and
> applications...).
> 
> Any volunteer for a review before I continue finalizing this patch ?

'now the behavior is more deterministic'  ,does this 'more' means 100% ?