Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.15
-
None
Description
I just discovered a memory issue (Java heap space) that happen only if we try to use stripper.getText(pdfFile) on a pdf that has missing incorporated fonts (like the one in attachment).
To replicate the issue you can use this snippet with the pdf file in attachment:
import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; import java.io.IOException; import java.io.InputStream; public class OutOfMemoryExample { public static void main(String[] args) throws IOException { try(InputStream docStream = Thread.currentThread().getContextClassLoader().getResource("ceh.pdf").openStream(); PDDocument cd = PDDocument.load(docStream)){ PDFTextStripper stripper = new PDFTextStripper(); // OutOfMemory here String pdfText = stripper.getText(cd); System.out.println(pdfText); } } }
Attachments
Attachments
Issue Links
- Blocked
-
PDFBOX-4489 Memory issue on org.apache.fontbox.ttf.GlyphSubstitutionTable.readLangSysTable(GlyphSubstitutionTable.java:147)
- Closed