Details
Description
Unable to convert certain pages to images for certain PDF documents. Getting error: java.lang.ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary
method for converting page is called this way: BufferedImage image = page.convertToImage(BufferedImage.TYPE_3BYTE_BGR, 300); // where page is of type org.apache.pdfbox.pdmodel.PDPage
Full stacktrace (of relevant part):
java.lang.ClassCastException: org.apache.pdfbox.cos.COSNull cannot be cast to org.apache.pdfbox.cos.COSDictionary
at org.apache.pdfbox.pdmodel.graphics.xobject.PDCcitt.getRGBImage(PDCcitt.java:119)
at org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:78)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
at org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:130)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:107)
at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:722)
at eu.eudml.enhancement.pdf2textviaocr.PdfImageExtractor.extractImagesUsingPdfParser(PdfImageExtractor.java:236)