Details
Description
I use the PDDocument in my application and noticed that the load method throws an IOException (Error: End-of-File, expected line) with certain PDF files like the one in the attachment.
My code:
protected List<String> getLocalPages(final Resource completeEditionResource, final Edition edition, final int firstPage) throws Exception { PDDocument document = null; try { final InputStream in = completeEditionResource.getInputStream(); document = PDDocument.load(in, MemoryUsageSetting.setupTempFileOnly()); } PdfUtils.disableImageCache(document); return splitAndSavePages(document, firstPage, completeEditionResource, edition.getPublishedDate()); } finally { if (document != null) { document.close(); } completeEditionResource.getInputStream().reset(); } }
Exception thrown:
java.io.IOException: Error: End-of-File, expected line at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1107) at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2650) at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2633) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1230) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1148) at com.flip.CompletePdfAnalyzer.getLocalPages(CompletePdfAnalyzer.java:162)
I successfully downloaded the PDF using FileUtils.copyInputStreamToFile from Apache Commons-IO just before PDDocument.load to verify that the inputStream was correct.