Commons Pool
  1. Commons Pool
  2. POOL-161

ContextClassLoader problems for the Evictor thread

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.4
    • Fix Version/s: 1.6.1, 2.0, 1.5.8
    • Labels:
      None

      Description

      Since a single Timer is used for several GenericObjectPool instances, this may create classloader issues and a memory leak of one classloader :

      Let's imagine the following scenario :

      • commons-pool.jar is in the classpath of a webapp container (e.g. tomcat).
      • 2 webapps A and B are deployed, each creating an instance of GenericObjectPool for its own usage.
      • each webapp makes use of the idle object evictor and sets a positive number for minIdle
      • first, webapp A instantiates its GenericObjectPool. Since this is the first TimerTask to be created, the Timer instance is created, thus creating a Thread whose ContextClassLoader is the current one, that is webapp A's ContextClassLoader.
        The TimerTask properly creates instances of idle objects in the pool, making use of the ObjectFactory provided by A.
      • then B instantiates its GenericObjectPool. A new TimerTask is created, and it tries to invoke the ObjectFactory provided by B. But when it needs a class that only exists in B webapp, it cannot find it because the ContextClassLoader of the Timer Thread is A's classloader.

      Other side effect : if webapp A is undeployed, but B is still running, then A's webappClassLoader cannot be GCed because the Timer Thread keeps a strong reference to A's classloader (as its context classloader).

      1. TestGenericObjectPoolClassLoader.patch.txt
        4 kB
        Sylvain Laurent
      2. patch_Evictor_CCL.txt
        3 kB
        Sylvain Laurent

        Activity

        Hide
        Sylvain Laurent added a comment -

        added TestGenericObjectPoolClassLoader.patch.txt to showcase the problem.

        Show
        Sylvain Laurent added a comment - added TestGenericObjectPoolClassLoader.patch.txt to showcase the problem.
        Hide
        Sylvain Laurent added a comment -

        attached patch_Evictor_CCL.txt : proposed patch to save the CCL that is used when the factory is set in the pool (either in constructor or in setFactory() ), and then use it in the run() method of the TimerTask to set the CCL to the correct one.

        Show
        Sylvain Laurent added a comment - attached patch_Evictor_CCL.txt : proposed patch to save the CCL that is used when the factory is set in the pool (either in constructor or in setFactory() ), and then use it in the run() method of the TimerTask to set the CCL to the correct one.
        Hide
        Mark Thomas added a comment -

        I wonder if we want to support this use case. The simple solution would be to package commons-pool in each web application. Disk is cheap, memory is cheap. Is the memory and disk space saved worth the additional complexity?

        I am currently looking to fix a related issue in Tomcat's use of commons-dbcp and commons-pool. I believe I'll be able to fix that entirely within Tomcat but if not, it will likely impact this fix. I'll report back here with any progress on the Tomcat issue.

        Show
        Mark Thomas added a comment - I wonder if we want to support this use case. The simple solution would be to package commons-pool in each web application. Disk is cheap, memory is cheap. Is the memory and disk space saved worth the additional complexity? I am currently looking to fix a related issue in Tomcat's use of commons-dbcp and commons-pool. I believe I'll be able to fix that entirely within Tomcat but if not, it will likely impact this fix. I'll report back here with any progress on the Tomcat issue.
        Hide
        Mark Thomas added a comment -

        It wasn't possible to fix the related issue in Tomcat so I have fixed it in pool.

        The issue was that the context class loader for the TimerThread would be set to whatever class laoder happened to be in use when the Thread was created. That could trigger a memory leak in multiple class loader environments. The fix should not impact the proposed patch since the fix sets the default class loader that this patch stores whilst an alternative class loader is used and then restores.

        Show
        Mark Thomas added a comment - It wasn't possible to fix the related issue in Tomcat so I have fixed it in pool. The issue was that the context class loader for the TimerThread would be set to whatever class laoder happened to be in use when the Thread was created. That could trigger a memory leak in multiple class loader environments. The fix should not impact the proposed patch since the fix sets the default class loader that this patch stores whilst an alternative class loader is used and then restores.
        Hide
        Sebb added a comment -

        Disk space and memory may be cheap, but finding the memory leak in a complicated web application and/or reconfiguring the application to avoid it is probably not cheap.

        So unless the fix turns out to be considerably more complex than the patch suggests, I think it would be worth implementing.

        Show
        Sebb added a comment - Disk space and memory may be cheap, but finding the memory leak in a complicated web application and/or reconfiguring the application to avoid it is probably not cheap. So unless the fix turns out to be considerably more complex than the patch suggests, I think it would be worth implementing.
        Hide
        Mark Thomas added a comment -

        My main concern is what other as yet undiscovered memory leaks may be lurking in the code. Having each web app (in this use case) use it's own copy of pool will avoid any such issues entirely. Fixing this issue sets us off down the potentially hugely complicated path of supporting the use of a single instance of commons pool across multiple class loaders. That is the complexity that concerns me.

        Show
        Mark Thomas added a comment - My main concern is what other as yet undiscovered memory leaks may be lurking in the code. Having each web app (in this use case) use it's own copy of pool will avoid any such issues entirely. Fixing this issue sets us off down the potentially hugely complicated path of supporting the use of a single instance of commons pool across multiple class loaders. That is the complexity that concerns me.
        Hide
        Sebb added a comment -

        I see.

        However it seems to me that it's still worth doing - it may be the only class-loader leak, and even if not, it will help some users.

        We don't have to guarantee that POOL is free of class-loader issues, but at least we could fix ones that don't involve a huge effort.

        We also need to clearly document that the code may have class-loader issues, and the work-round.

        Show
        Sebb added a comment - I see. However it seems to me that it's still worth doing - it may be the only class-loader leak, and even if not, it will help some users. We don't have to guarantee that POOL is free of class-loader issues, but at least we could fix ones that don't involve a huge effort. We also need to clearly document that the code may have class-loader issues, and the work-round.
        Hide
        Sylvain Laurent added a comment -

        I think commons-pool should be considered as the kind of infrastructure library that should be multi-classloader aware. For instance, it is often used through commons-dbcp for datasources declared in tomcat's server.xml.
        In such a scenario, one can have a single datasource shared by multiple webapps...

        Mark, if you don't intend to apply the patch I proposed for the next release of commons-pool, then this issue should be split in 2, one for the fix you did, one to fix the CCL before calling the ObjectFactory.

        Show
        Sylvain Laurent added a comment - I think commons-pool should be considered as the kind of infrastructure library that should be multi-classloader aware. For instance, it is often used through commons-dbcp for datasources declared in tomcat's server.xml. In such a scenario, one can have a single datasource shared by multiple webapps... Mark, if you don't intend to apply the patch I proposed for the next release of commons-pool, then this issue should be split in 2, one for the fix you did, one to fix the CCL before calling the ObjectFactory.
        Hide
        Phil Steitz added a comment -

        We seem to be talking about three different issues here:

        1. The issue that Mark's commit fixed - potential for memory leak when the class loader for the eviction thread is not the same loader that loaded the library. IIUC, Mark's patch fully fixes this issue and any other potential for classloader leaks associated with the evictor. Correct?

        2. Feature request to support the use case in the issue description. Looks to me like Mark's patch actually does fully fix the issue as narrowly defined above. I must be missing something here.

        3. Feature request for pool to work in arbitrary multi-classloader environments

        I agree with Mark that it is not obvious that we should aim to support 2., and I think we have some work do to in defining exactly what 3. means. I am strongly +1 on doing everything we can to avoid leaks.

        Show
        Phil Steitz added a comment - We seem to be talking about three different issues here: 1. The issue that Mark's commit fixed - potential for memory leak when the class loader for the eviction thread is not the same loader that loaded the library. IIUC, Mark's patch fully fixes this issue and any other potential for classloader leaks associated with the evictor. Correct? 2. Feature request to support the use case in the issue description. Looks to me like Mark's patch actually does fully fix the issue as narrowly defined above. I must be missing something here. 3. Feature request for pool to work in arbitrary multi-classloader environments I agree with Mark that it is not obvious that we should aim to support 2., and I think we have some work do to in defining exactly what 3. means. I am strongly +1 on doing everything we can to avoid leaks.
        Hide
        Sylvain Laurent added a comment -

        1. Correct

        2. No. Mark's patch fixes the CCL when the TimerThread is started. The patch I provided allows to have the ObjectFactory called with a different CCL just for the duration of the call. So there are no leak problems involved here for case #2, just a use case in a multi-classloader environment

        3. The issue I raised did not ask to review the whole commons-pool usage in multi-classloader environment, I just proposed to fix a use case I encountered. Actually I first discovered the leak and then found out about the issue about multi-classloader usage...

        So, If my patch were applied, I would consider the issue as totally fixed.

        Show
        Sylvain Laurent added a comment - 1. Correct 2. No. Mark's patch fixes the CCL when the TimerThread is started. The patch I provided allows to have the ObjectFactory called with a different CCL just for the duration of the call. So there are no leak problems involved here for case #2, just a use case in a multi-classloader environment 3. The issue I raised did not ask to review the whole commons-pool usage in multi-classloader environment, I just proposed to fix a use case I encountered. Actually I first discovered the leak and then found out about the issue about multi-classloader usage... So, If my patch were applied, I would consider the issue as totally fixed.
        Hide
        Phil Steitz added a comment -

        Thanks, Sylvain. My mistake on 2.

        I am on the fence as to whether to apply Sylvain's patch for 1.5.x or wait until 2.0 to fully fix the issue. Does anyone see any risk in applying the patch in 1.5.5? If we don't apply it, we should at least update the docs to make it clear that the evictor is nailed to the loader used to load the library.

        Show
        Phil Steitz added a comment - Thanks, Sylvain. My mistake on 2. I am on the fence as to whether to apply Sylvain's patch for 1.5.x or wait until 2.0 to fully fix the issue. Does anyone see any risk in applying the patch in 1.5.5? If we don't apply it, we should at least update the docs to make it clear that the evictor is nailed to the loader used to load the library.
        Hide
        Craig Ringer added a comment -

        This has been biting me rather severely. It's not "only" a classloader leak because, by leaking the classloader it can become impossible to unload huge parts of the web application because so many Java libraries like to keep static caches.

        Mark Thomas: It's not correct to state that having each app use its own copy of commons-pool will fix the issue. In fact, I encountered the issue precisely because my app deploys its own copy of commons-pool. I landed up with a timer thread used by commons-pool in a ThreadGroup owned by the app server. The timer thread holds a reference to my classloader via contextClassLoader, so my app's classloader cannot be unloaded.

        An effective workaround appears to be to move apache commons pool into the application server's set of provided classes. For Glassfish, that's glassfish/domains/domain1/lib .

        Show
        Craig Ringer added a comment - This has been biting me rather severely. It's not "only" a classloader leak because, by leaking the classloader it can become impossible to unload huge parts of the web application because so many Java libraries like to keep static caches. Mark Thomas: It's not correct to state that having each app use its own copy of commons-pool will fix the issue. In fact, I encountered the issue precisely because my app deploys its own copy of commons-pool. I landed up with a timer thread used by commons-pool in a ThreadGroup owned by the app server. The timer thread holds a reference to my classloader via contextClassLoader, so my app's classloader cannot be unloaded. An effective workaround appears to be to move apache commons pool into the application server's set of provided classes. For Glassfish, that's glassfish/domains/domain1/lib .
        Hide
        Mark Thomas added a comment -

        Craig: Per context use of commons pool does avoid this bug and does not trigger a memory leak. The problem you are describing is not a pool bug but the application's (or possibly the container's depending on exact usage) failure to correctly shutdown the pool when the context stops. For the record Tomcat handles this correctly. The workaround you describe for glassfish will trigger the bug that Sylvain describes in point 2.

        Sylvain: Thanks for the patch. It has been applied - with some minor tweaks to 1.6.x and will be included in 1.6.1 onwards. I will back-port it to 1.5.x forward port it to 2.x shortly.

        Show
        Mark Thomas added a comment - Craig: Per context use of commons pool does avoid this bug and does not trigger a memory leak. The problem you are describing is not a pool bug but the application's (or possibly the container's depending on exact usage) failure to correctly shutdown the pool when the context stops. For the record Tomcat handles this correctly. The workaround you describe for glassfish will trigger the bug that Sylvain describes in point 2. Sylvain: Thanks for the patch. It has been applied - with some minor tweaks to 1.6.x and will be included in 1.6.1 onwards. I will back-port it to 1.5.x forward port it to 2.x shortly.
        Hide
        Gary Gregory added a comment -

        We'll need a 1.5 branch for this in 1.5.8.

        Show
        Gary Gregory added a comment - We'll need a 1.5 branch for this in 1.5.8.
        Hide
        Mark Thomas added a comment -

        Huh? Could you explain that comment please. We already have a 1.5.x branch.

        Show
        Mark Thomas added a comment - Huh? Could you explain that comment please. We already have a 1.5.x branch.
        Hide
        Gary Gregory added a comment -

        Sorry I confused 1_5_RELEASE (used for 1.5.x) and POOL_1_X (the latest release from there is 1.6)

        Show
        Gary Gregory added a comment - Sorry I confused 1_5_RELEASE (used for 1.5.x) and POOL_1_X (the latest release from there is 1.6)
        Hide
        Mark Thomas added a comment -

        Fix ported to 1.5.x and 2.x

        Show
        Mark Thomas added a comment - Fix ported to 1.5.x and 2.x
        Hide
        Matus Ferko added a comment -

        HI,
        is this fixed in any 1.5.x Release/tag version?
        I can't find 1.5.8 in maven repository

        Show
        Matus Ferko added a comment - HI, is this fixed in any 1.5.x Release/tag version? I can't find 1.5.8 in maven repository

          People

          • Assignee:
            Unassigned
            Reporter:
            Sylvain Laurent
          • Votes:
            4 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development