Uploaded image for project: 'Xerces2-J'
  1. Xerces2-J
  2. XERCESJ-1653

Memory leak with validating SAX Parser

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.11.0
    • None
    • SAX
    • None
    • Windows 7 Enterprise, JDK 1.8.0_25

    Description

      I'm parsing a very large XML file with org.apache.xerces.parsers.SAXParser and validation turned on. The file contains 25 million elements of the form specified in the attached DTD's, in total it is ca. 7 GB large.

      Heap monitoring with jvisualvm shows millions of QName instances being cached and not being garbage collected.

      Turning off validation makes the problem disappear.

      I have tested a numer of other parsers (Crimson, Aelfred2, Resin, Woodstox). With Woodstox, for example, I can process my 7 GB file (including validation) with just 64MB of heap. With Xerces, 1024MB of heap do not suffice.

      I'll attach a small diagnosis program (SAXMemoryUsage.java) that shows that Xerces heap consumption increases inordinately.

      Attachments

        1. _subelem.dtd
          0.2 kB
          Sebastian Millies
        2. 1653-cleanup.patch
          0.6 kB
          Jan Tošovský
        3. 1653-map.patch
          4 kB
          Jan Tošovský
        4. elemlist.dtd
          0.2 kB
          Sebastian Millies
        5. SAXMemoryUsage.java
          5 kB
          Sebastian Millies
        6. SAXMemoryUsage.log
          9 kB
          Sebastian Millies

        Activity

          People

            Unassigned Unassigned
            s.millies@ids-scheer.de Sebastian Millies
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: