Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
2.0.8
-
None
-
None
Description
I am using PDFBox version 2.0.8. I am trying to render scanned pdfs but there are a some that do not render and result in an error. Native pdfs do not have any trouble rendering. The majority of the scanned pdfs that I have also do not have any trouble rendering but there are a couple that result in an error (one is attached).
This is the code I used to render the pdf.
try (PDDocument document = load(file)) { logger.debug("start generate image file " + pageNumber + " for " + name); PDFRenderer pdfRenderer = new PDFRenderer(document); return getPageImage(pdfRenderer, pageNumber, name, storageId); }
The above call to getPageImage calls the following code
File imageFile = File.createTempFile(StringUtils.toFilename(storageId) + "_" + pageNumber, ".png"); imageFile.deleteOnExit(); final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi, ImageType.RGB); ImageIO.write(image, "png", imageFile); logger.debug("completed generate image file " + pageNumber + " for " + name); return imageFile;
The issue occurs in the second code snippet in the line
final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi, ImageType.RGB);
The stack trace is the following
Caused by: java.io.IOException: Error: Expected operator 'ID' actual='In' at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:305) ~[pdfbox-2.0.8.jar:2.0.8] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:502) ~[pdfbox-2.0.8.jar:2.0.8] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469) ~[pdfbox-2.0.8.jar:2.0.8] at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) ~[pdfbox-2.0.8.jar:2.0.8] at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:203) ~[pdfbox-2.0.8.jar:2.0.8] at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145) ~[pdfbox-2.0.8.jar:2.0.8] at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) ~[pdfbox-2.0.8.jar:2.0.8] at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:70) ~[classes/:?] at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:59) ~[classes/:?]
Since rendering was not an issue with native pdfs I initially thought that only scanned pdfs were an issue. But after other scanned pdfs rendered, I am uncertain as to what could be causing some to render and some to error out.