Issue Details (XML | Word | Printable)

Key: JAMES-592
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Stefano Bagnara
Reporter: Norman Maurer
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
JAMES Server

OOM caused by unbounded cache in InetAddress (was James leaks memory slowly)

Created: 21/Aug/06 02:04 PM   Updated: 21/Nov/07 08:31 AM
Component/s: None
Affects Version/s: 2.2.0, 2.3.0
Fix Version/s: 2.3.1

Time Tracking:
Not Specified

Issue Links:
Reference
 

Resolution Date: 06/Feb/07 10:53 AM


 Description  « Hide
Noel wrote on list:
I do not know where in the application it is happening, but after running
JAMES non-stop since Fri Aug 11 03:29:57 EDT 2006, this morning the JVM
started to throw OutOfMemoryError exceptions, such as:

21/08/06 08:39:47 WARN mailstore: Exception retrieving mail:
java.lang.RuntimeException: Exception caught while retrieving an object,
cause: java.lang.OutOfMemoryError, so we're deleting it.

That did not recover, so it wasn't just due to a transient large allocation
(which I limit, anyway), so there is definitely something leaking, albeit
slowly. Keep in mind that the store was one of the victims, but not
necessarily the cause.

The JVM process size had steadily grown from a somewhat stable 114MB to
130MB last night. I did not look at it this morning before restarting the
server.

        --- Noel


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Norman Maurer added a comment - 21/Aug/06 02:04 PM
One thing to review:

  https://list.scms.waikato.ac.nz/pipermail/wekalist/2006-June/007468.html

I do use org.gjt.mm.mysql.Driver, which is the only software in the JVM not
part of the standard JAMES 2.3.0 distribution.

See also: http://bugs.mysql.com/bug.php?id=20003

Norman Maurer added a comment - 21/Aug/06 02:05 PM
After reading the bugreport and have a quick look at our code it seems that we need to add "ResultSet.close()" to our code.

Norman Maurer added a comment - 21/Aug/06 02:43 PM
Ok after recheck our code it seems that we did all correct. So any idea is welcome :-(

Bernd Fondermann added a comment - 22/Aug/06 10:06 AM
Maybe we can confirm a db/mysql specific problem by running identical scenarios on file/derby/mysql and compare mem consumption.
At best, the scenario would mimmick Noel's production load (scaled up, to accelerate the process), if he likes to disclose this information.

Stefano Bagnara added a comment - 24/Aug/06 11:08 AM
What mysql server version do you use? What connector/j version do you use?
If you're not running the latest connector/j upgrade it, otherwise try adding "dontTrackOpenResources=true" to your jdbc url. If with the latest driver and no option the memory leak is there and adding the option the memory leak disappear then we probably miss some resultset/connection/statement close in some path of our code, otherwise tell us the results.

Norman Maurer added a comment - 02/Sep/06 01:57 PM
This OOM disappear after noel change to a new mysql connector.

Noel J. Bergman added a comment - 11/Sep/06 01:39 AM
Re-opening the issue, since it still exists and was prematurely closed.

Noel J. Bergman added a comment - 11/Sep/06 01:41 AM
This isn't, and probably never was, about MySQL. JAMES leaks memory, and we're not really sure why, yet.

Noel J. Bergman added a comment - 11/Sep/06 01:50 AM
As per comments on mailing list:

Unfortunately, I once again confirm (this time using the new memstat check in RemoteManager) that we are losing about 2MB per day. And it does seem related to the days, not the amount of mail.

I wonder about the memory consumed by those Phoenix artifacts that showed up in the prior logs. Considering that both Stefano and Norman run with larger heaps than I do, and that at least Stefano rotates logs relatively infrequently compared to my daily rotation, it is entirely possible that they would not see such leaks for extended periods of time. Also, load testing wouldn't help in this case, since it seems to be independent of load.

I am still prepared to release, but we will have to document this issue.

To check for this in your environment, once a day, run memstat -gc (new to 2.3.0 RC3) in the RemoteManager, and record the available heap after the garbage collection. I have done this for the past 3 days, and see ~2MB less memory available each day.

Noel J. Bergman added a comment - 11/Sep/06 02:43 AM
Changing priority based upon JIRA definitions. This would be either critical or major, depending upon one's interpretation of the severity of the memory leak.

Stefano Bagnara added a comment - 11/Sep/06 07:35 AM
If you believe the problem is in the log rotation you may want to try replacing the DefaultLoggerManager with ExtendedLog4jLoggerManager in your kernel.xml.
If the problem is in logkit or in its manager it won't leak anymore.

Stefano Bagnara added a comment - 03/Nov/06 02:30 PM
I think this is related to what Serge reported here:
http://mail-archives.apache.org/mod_mbox/james-server-dev/200604.mbox/%3C140e061f0604251100s3820ba63t583200f0fa679043@mail.gmail.com%3E
http://www.lokitech.com/~sergek/james/dnsleak/index

Maybe you should try starting phoenix with this property:
-Dsun.net.inetaddr.ttl=10
(the documented property for jvm 14 is "-Dnetworkaddress.cache.ttl=10" but it seems this won't work)

Noel J. Bergman added a comment - 03/Nov/06 02:55 PM
Yes. We discussed Serge's e-mail at the time (the InetAddres part was old news, even then), have aleady made changes related to the dnsjava cache, and were supposed to have eliminated all uses of InetAddress for resolution. It just seems that in v2.3.0, we missed a few places that still populated the InetAddress cache instead of calling dnsjava.

We have known about this problem for years, which is why we had previously set about removing all uses of InetAddress for resolution from the code. We just missed some. See also http://www.rgagnon.com/javadetails/java-0445.html, which explains the problem you had (as also noted by http://java.sun.com/j2se/1.4.2/docs/api/java/net/InetAddress.html for JDK 1.4 and http://java.sun.com/j2se/1.5.0/docs/guide/net/properties.html for JDK 5, the non-Sun properties are security policy properties).

The proper solution is to use dnsjava for everything, and avoid anything that impacts the InetAddress cache.

Stefano Bagnara added a comment - 06/Nov/06 12:16 AM
Updated the subject of the issue to be more specific now that Noel found out the source of the problem.

Noel J. Bergman added a comment - 09/Nov/06 05:51 PM
Issue was caused by a single InetAddress.getHostName() call in SMTPHandler.java (also present in each of the other handlers). Resolved by replacing with a call to the DNSServer, which uses dnsjava, and doesn't suffer from the same problems as Sun's InetAddress implementation.

Stefano Bagnara added a comment - 18/Nov/06 07:14 PM
I reopen this while we decide what kind of fix backport to each of our branches.
Also removing Trunk from fix versions as everything we have currenlty in trunk will be in next-major so it is more clean if we use only next-major for versions. Like we did previously for v2.3 vs trunk.

Stefano Bagnara added a comment - 06/Feb/07 10:53 AM
Now james uses a default 300 seconds expiration for the positive dns results cache.
The expiration is tunable via system property -Dnetworkaddress.cache.ttl
Setting it to -1 will revert to the default "cache forever" JVM behaviour.
Setting it to 0 will remove caching at all.
Applied to trunk 4 days ago and backported to v2.3 (for 2.3.1) now.

Danny Angus added a comment - 21/Nov/07 08:31 AM
Closing issue fixed in released version.