Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.11.0
-
None
-
None
-
Windows 7 Enterprise, JDK 1.8.0_25
Description
I'm parsing a very large XML file with org.apache.xerces.parsers.SAXParser and validation turned on. The file contains 25 million elements of the form specified in the attached DTD's, in total it is ca. 7 GB large.
Heap monitoring with jvisualvm shows millions of QName instances being cached and not being garbage collected.
Turning off validation makes the problem disappear.
I have tested a numer of other parsers (Crimson, Aelfred2, Resin, Woodstox). With Woodstox, for example, I can process my 7 GB file (including validation) with just 64MB of heap. With Xerces, 1024MB of heap do not suffice.
I'll attach a small diagnosis program (SAXMemoryUsage.java) that shows that Xerces heap consumption increases inordinately.