-
Type:
Bug
-
Status: Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 1.20, 1.19.1
-
Fix Version/s: None
-
Component/s: parser
-
Labels:None
-
Environment:
Reproduced on Windows 2012 R2 and Ubuntu 18.04.
Java: jdk1.8.0_151
I have an application that extracts text from multiple files on a file share. I've been running into issues with the application running out of memory (~26g dedicated to the heap).
I found in the heap dumps there is a "fDTDDecl" buffer which is creating very large char arrays and never releasing that memory. In the picture you can see the heap dump with 4 SAXParsers holding onto a large chunk of memory. The fourth one is expanded to show it is all being held by the "fDTDDecl" field. This dump is from a scaled down execution (not a 26g heap).
It looks like that DTD field should never be that large, I'm wondering if this is a bug with xerces instead? I can easily reproduce the issue by attempting to extract text from large .pst files.