Bug 42742

Summary: big Ant/Ivy builds run out of permanent memory. Classloader leaks?
Product: Ant Reporter: Steve Loughran <stevel>
Component: CoreAssignee: Ant Notifications List <notifications>
Status: NEW ---    
Severity: normal CC: kahmyong.moon
Priority: P2    
Version: 1.7.0   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   

Description Steve Loughran 2007-06-26 04:15:24 UTC
as you can see from , http://jira.smartfrog.org/jira/browse/SFOS-167 , we've
been running out of PermGenHeapSpace on windows (XP and vista) builds. We have a
workaround for this (increase the permgen heap size in ANT_OPTS), but the
problem remains.

PermGenHeapSpace is where the JVM keeps
 -intern'd strings
 -loaded classes and all the introspection stuff
By the end of a (working) build, we had 4000+ classes loaded, which is quite a lot.

What I'm wondering is are antlibs and taskdefs somehow leaking classloaders and
hence classes. all our (15+) child projects are designed to run standalone, so
use ivy via antlib declarations and taskdef ivy and smartfrog explicitly.

I could add more checks before declaring stuff, but fear there may be some
fundamental leaking of stuff that big ivy cross-project builds is showing up
Comment 1 Steve Loughran 2007-06-26 15:10:16 UTC
Note for the curious that the jira bugrep includes a heap dump that you can use
java6;s jhat against (
http://jira.smartfrog.org/jira/secure/attachment/10030/java_pid1292.julio.zip ) 

If you look at the list of classes that are leaking, its the custom tasks that
are being taskdef'd, in particular ivy, primarily because its so big (once jhat
is running, the url is http://localhost:7000/allInstances/0x6ae5d00).

I'm loading these using typedef, rather than antlib uris, because each task is
designed to be self contained. I may be able to go back and skip the loading if
they are already on the classpath, but 
 1. we could maybe make this an option (reload=true/false) to check before loading
 2. can't we stop loading so many instances? Surely when we exit a project, its
gone.
Comment 2 Peter Reilly 2007-06-27 03:44:39 UTC
It is hard to track down all memory leakages.
It would be nice to have a build file that showed
the problem (without ivy if possible or at least
without need for a network connection - i.e. self-contained).

For ant 1.7.0 a number of classloader related memory leakages
have been fixed - see for example:
http://issues.apache.org/bugzilla/show_bug.cgi?id=33061
Comment 3 Steve Loughran 2007-06-27 04:06:45 UTC
1. peter, this ant 1.7.0 retail. There's a big check for older version up front
and we halt the build with an error message. 

2. its not one single build, it is a big chained build that is causing excess
classloadings. You can replicate it by checking out 

svn co https://smartfrog.svn.sourceforge.net/svnroot/smartfrog/trunk/core
smartfrog-core 
then running "ant cruise"

3. Its not really ant itself that is leaking, or even the tasks, more the fact
that if every build file reloads tasks (so it works self contained), the tasks
hang around after that subant-initiated build terminates. 
Comment 4 Stephane Bailliez 2007-06-27 13:57:46 UTC
Note that I have the same problem with our build at the company, but I learned
from an early age not to trust this type of meta build but fork individual build
instead.
Comment 5 Steve Loughran 2007-06-28 03:45:00 UTC
well, perhaps we need a forking subant. hmmm. 

Having done -v runs to see what is going on. I am <typdef>ing the ivy antlib and
smartfrog as a tasks.properties file, both of which are ignored with the
(ignoring redeclaration of ... ) messages. But somehow the classloader is being
retained.
Comment 6 Peter Reilly 2007-07-16 06:57:13 UTC
Hi Steve,
Thanks, I can now repeat the problem
with the following build file:

<project name="e" default="run">
  <property name="ac"
            location="${user.home}/apps/ant-contrib/ant-contrib-1.0b3.jar"/>
  <target name="run">
    <typedef resource="net/sf/antcontrib/antlib.xml"
             classpath="${ac}"/>
    <for begin="1" end="1000" param="p">
      <sequential>
        <antcall target="define"/>
        <echo>@{p}</echo>
      </sequential>
    </for>
  </target>
  <target name="define">
    <typedef resource="net/sf/antcontrib/antlib.xml"
             classpath="${ac}"/>
  </target>
</project>
If I set ANT_OPTS to:
export ANT_OPTS="-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:SurvivorRatio=2

The build completes - but painfully slowly, (after ~ 300 iterations the
slowdown is very noticable), the time reported for
the build is ~30 minutes.

If I set ANT_OPTS to:
export ANT_OPTS="-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:SurvivorRatio=2 
-XX:MaxPermSize=8m -XX:PermSize=8m"
The build completes relatively quickly (~ 2 1/2 minutes)
(see : http://developers.sun.com/mobility/midp/articles/garbagecollection2/#3.3
for use and description of GC flags)
I conclude that for this build file, the problem is not
ant, but it is the GC in java and its treatment of classes.



It may be useful to add a check in <typedef> to
see if the same typedef has been done before - this
may cause other problems (the contents of the
jars made have changed - adding new classes or antlibs definitions).

As Stephane does, I also normally fork build files to
avoid similar (and other) problems.

I use the following  macro:

  <macrodef name="sub">
    <attribute name="dir"/>
    <attribute name="target"/>
    <sequential>
      <exec executable="bash"
            dir="@{dir}"
            failonerror="yes">
        <arg value="-c"/>
        <arg value="ant -emacs @{target}"/>
      </exec>
    </sequential>
  </macrodef>

Comment 7 Steve Loughran 2007-07-16 09:22:36 UTC
1. a small tests is very welcome :)

2. The problem is that the JRE hangs on to loaded classes in a separate part of
the heap, and does so until all instances of the class are gone. So we need to
somehow make sure we have no instances of typedef'd stuff hanging around, or
references to it. 
Comment 8 Peter Reilly 2007-07-16 10:44:34 UTC
Hi Steve,
what I am saying is that there may be a bug with GC of
classes. I have seen this happen with older versions of java.

I cannot get "ant cruise" to work on the checked out smartfrog
- the problem is "build.xml:284: Cruise Control was not found in
/home/peter/svn/main)"

I tried ant dist, this worked without a problem (linux fedora 7, jdk1.7).
Comment 9 Steve Loughran 2007-07-17 02:15:14 UTC
Peter

1. try ant cc instead; ant cruise turns out to try and run CC

2. I've patched our common.xml to not redeclare the ivy tasks if they are
already defined, so a full build no longer runs out of memory.

3. but it was, on Java 1.6
Comment 10 Peter Reilly 2007-07-17 05:06:48 UTC
Thanks Steve,
I see the bug now.
My fix for the GC did not work on in.
Comment 11 Peter Reilly 2007-09-18 09:23:44 UTC
I have tracked down a problem with Ivy.
It uses a IvyContext to store information. This
uses a thread local variable to achieve
global variable semantics.
It appears that this object does not get GCed,
and as it contains objects that are classes loaded
by the AntClassLoader, the AntClassLoader also
does not get GCed.

Comment 12 Steve Loughran 2007-09-19 02:18:07 UTC
1. Is there an ivy bug # to track?

2. Is there something we can do in Ant to assist in this?

It sounds like Ivy needs to listen for build completion and purge its state when
a  build finishes
Comment 13 Peter Reilly 2007-09-19 02:31:51 UTC
1) no
2) I raised the issue on the Ivy dev mailing list.
 >It sounds like Ivy needs to listen for build completion and purge its state when
 >a  build finishes
Yes I tried that (listening for subproject build ending and clearing
IvyContext) and it seems to work (I have problems with the ant cc for
smartfrog - 1) the system tests fail and 2) ivy 2.0.0 trunk sees
problems with the dependences (missing javadoc artifacts))

However there is another problem with the implementation of IvyContext
which I raised on the Ivy mailing list. The way it is implemented
means that sub-projects will wipe the context of master projects
(if the same classloader is used for ivy in the sub-projects).
Comment 14 Peter Reilly 2007-09-19 02:42:29 UTC
One way ant could help would be not to use a new classloader
in the case with the path for the new task/type definition is
the same as the current definition. At the moment this
is treated as a "similar" definition, which overrides the
current definition with a new classloader, but does not
inform the world:

  project.log("Trying to override old definition of "
          + (isTask ? "task " : "datatype ") + name, (def.similarDefinition(old,
           project)) ? Project.MSG_VERBOSE : Project.MSG_WARN);

The reason for using a new classloader is that some of the jars, directories
may have changed since the last <typedef/>, however I do not think
that this happens for real (and for windows changing the jar while is
is used in a classloader is not easy).

However, this will not solve the general problem as the master project
may not have loaded the tasks/types.
 
Comment 15 Steve Loughran 2007-09-19 04:09:59 UTC
>I have problems with the ant cc for
>smartfrog - 1) the system tests fail and 2) ivy 2.0.0 trunk sees
>problems with the dependences (missing javadoc artifacts))

I'll look at this. The javadocs should get published in the build. 

The system tests do run on Cruise control, but it skips some of the web tests as
port 8080 is already on use on that machine. email me the error messages and
I'll look at them
Comment 16 Xavier Hanin 2007-11-13 03:18:18 UTC
I've created an issue in Ivy related to this problem:
https://issues.apache.org/jira/browse/IVY-639

I've just checked in a fix, the current trunk version should not have the memory
leak and subproject handling problem anymore. But I don't have a good test case
to test this out, so if one of you who already investigated the issue could give
it a try, it would be great!
Comment 17 Steve Loughran 2007-11-14 03:55:07 UTC
I think there are/were two problems here. First, redefining stuff with <taskdef>
causes/caused leaks. Second, ivy itself was consuming stuff. I can put some
switches in to our build file to redefine the ivy tasks without checking for
them being present, and test against the latest code. Is there a new alpha/beta
of Ivy available for me to do this?
Comment 18 kahmyong.moon 2013-10-11 21:17:11 UTC
I don't know if there's been any recent action on this, but after hitting a similar issue (with Ant 1.7.2 through 1.9.2, and Ivy 2.3.0), and poking around heap dumps for a while, I've noticed two potential issues:

1. On the Ant side, IntrospectionHelper.bean holds a reference to the Ivy classes. Bug 30162 fixed a similar issue with embedded Ants by calling IntrospectionHelper.clearCache() at the end of a build, but this doesn't cover sub-builds.

2. On the Ivy side, IvyContext.getContext() automatically pushes an IvyContext if one does not already exist (Message.getLogger() seems to do this pretty early). Since this automatic IvyContext doesn't follow the scoping rules introduced in https://issues.apache.org/jira/browse/IVY-639, this means there's one IvyContext left hanging around after the subbuild ends.

Anyway, I'll probably go with the forking workaround mentioned by others in this thread, but it would be nice to get the root issues fixed...