Linux 2.4.28 + Java 1.4.2_09 + Tomcat 4.1.31
I have what I believe to be a relatively lean and mean search application that uses Velocity Tools, and after 2 months I started getting the dreaded OutOfMemoryException. I started looking for anything obvious that could be the problem, and I found that I was using ImportTool.read in a few places, and my log files were full of exceptions that occurred during those operations, because the URLs were bad or whatever.
Just for the heck of it, I decided to look at how the ImportTool actually imported it's text, since it's a pretty heavy operation and if it was failing alot perhaps it was contributing to my OOM woes. So, I dove into the code to see how everything worked. The ImportTool class is trivial... the real meat is in ImportSupport, where I found some things that we might want to consider.
1. [trivial] In ImportSupport.acquireReader, an InputStreamReader is created with the character set loaded from the remote resource (makes sense). In the case where Java doesn't support the encoding, we use the default encoding. Unfortunately, the code uses catch(Exception) instead of catch(UnsupportedEncodingException). If some other error occurred, it would be swallowed, because there is no error reporting in this case.
2. [confusing] This comment appears just before the end of ImportSupport.acquireReader:
// check response code for HTTP URLs before returning, per spec,
// before returning
It is unclear which spec the author is referencing. It could be the HTTP spec (unlikely) or the HttpURLConnection spec (where I can't find any hints). It might be nice to clear this up.
Further, it appears that the author was trying to check the status in order to throw an error if there wasn't an "OK" response. This code appears /after/ an input stream is retrieved from the request.
It turns out that the class that you ultimately get back from opening an HTTP URL is sun.net.www.protocol.http.HttpURLConnection, which will throw an exception if you call getInputStream if the status of the connection is anything >= 400. That means that many of the error conditions are already handled by the time this status code check executes.
Pretty much anything under 400 should be considered an okay response, no?
3. [potentially dangerous] There is never any resource cleanup of the HTTP connection. This is actually complicated due to the way that the methods were written: acquireReader creates the connection and returns a Reader object, which is tied to the HttpURLConnection. aquireString knows nothing about the HttpURLConnection and can therefore not cleanup. The reader itself gets closed, but that does not guarentee that the connection itself gets cleaned up.
There's not even cleanup code in the acquireReader method itself to close the connection if we throw an exception ourselves (for example, because we don't like the status code).
These problems leave potential, if not actual, resource leaks. Sure, the GC will hopefully come along eventually and cleanup the mess that was left, but it's not guarenteed to do so in a timely mannar. Things like InputStreams and HTTP connections are probably relatively heavy objects and should be cleaned up whenever possible.
4. [bad style?] All of the exceptions explicitly created and thrown from ImportSupport are of type java.lang.Exception. I feel like we could come up with some better exceptions to throw.
I am happy to do the work required to fix all these issues – I'm not just a whiner I wanted to have a discussion about how the project leaders felt about each item and whether or not they were worth fixing.