Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.0.0 PDFBox
-
None
-
MacOS, but likely not OS specific.
Description
Version: org.apache.pdfbox:pdfbox:3.0.0-alpha3
In a subset of PDFs I process, I cannot extract a range of PDF pages and write them out to a new PDF. ( As part of test code )
Here's the Kotlin code I use
fun extractPages(documentName: String, fromPage: Int, toPage: Int) : Path { val pdfFile = Paths.get("data", "input", "PDFS", "${documentName}.pdf") val pdfPagesFile = Paths.get("data", "input", "PDFS", "${documentName}_Page_$fromPage-$toPage.pdf") val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile.toFile()) val pageExtractor = PageExtractor(pdfDoc, fromPage, toPage) val pdfPages = pageExtractor.extract() pdfPages.save(pdfPagesFile.toFile()) return pdfPagesFile }
It doesn't occur in all PDFS... maybe 10-20% of the PDFs I use.
The a slice of the stack trace is
java.lang.StackOverflowError at java.base/java.util.HashMap.tableSizeFor(HashMap.java:380) at java.base/java.util.HashMap.<init>(HashMap.java:453) at java.base/java.util.LinkedHashMap.<init>(LinkedHashMap.java:347) at java.base/java.util.HashSet.<init>(HashSet.java:162) at java.base/java.util.LinkedHashSet.<init>(LinkedHashSet.java:154) at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:380) at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1225) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:336) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230) at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
As I mentioned, hits some PDFs, not all.
I legally cannot share the original source PDFs but it looks like a recursive loop in writeCOSDictionary and writeObject in COSWriterObjectStream.