I am experiencing very similar problem but with a significantly larger impact, attached is the zip holding binary with pulled patch (which does effective usage of hashCode() and equals()), below is the binary manifest snippet
Ant-Version: Apache Ant(TM) version 1.8.3 compiled on February 26 2012
Created-By: 1.6.0_32-ea (Sun Microsystems Inc.)
I have a compressed input stream file of roughly 25M (24.4Mib) holding xml, compression is achieved using java.util.zip compression/decompression api's with the default strategy, and I am sure the file could go anywhere close to 500M inflated.
A simple piece of code gets deployed in Tomcat 6.14 - Tomcat 7.0.50 (with java 1.6.30 & java 1.6.32) as a webapp to read-in the compressed file and run an xml parser on it and it takes nearly 30 minutes to parse out fully on a 4-core i5 2.5Ghz processor laptop (nothing in this entire process is parallelized for any kind of optimization reasons). This has been checked and confirmed with explicitly putting the xerces binaries (2.6 and 2.11) to allow xerces to take control of the entire parsing AND even on java default's parsing implementation which is very much the same as seen in xerces.
During multiple execution below code in xerces has been identified as potential hotspot (via multiple profiling tools) choking up entirely and is happening due to somewhat bad nested looping in the code with significantly larger value indexes (potentially in MB's) and also gets aligned with the comment.
// REVISIT: we can improve performance by using hash codes, instead of
// traversing global vector that could be quite large.
[NOTE] Interestingly the same piece of code runs perfectly (with both jdk and xerces implementation) within a minute via Eclipse and even on the very plain \" java -classpath ... ParserTest \" without any significant JVM hotspot indications which makes a matter of worry on whether Tomcat internally is doing something during the entire parsing???
As of now I am able to run it within a minute inside Tomcat also, binary pairs can be used as a drop-in replacement for people facing such problem.
[ATTENTION] On a different angle with the existing xerces binaries if the application attempts to re-process the xmls, even in a different thread, then it severly impacts the execution of other operational threads, thus the entire webapp appears to start freezing randomly, and strangely takes even much higher time to do the parsing (close to 2x time) even with enough memory allocation. I am not sure whether the issue will persist with other other application servers like glassfish or jetty OR it's purely binded to Tomcat.