Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5480

PDDocument.load thows IOException in PDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 2.0.25, 2.0.26
    • None
    • Parsing, PDModel
    • None
    • Ubuntu 20.04.4 LTS
      Java OpenJDK 11.0.12-open

    Description

      I use the PDDocument in my application and noticed that the load method throws an IOException (Error: End-of-File, expected line) with certain PDF files like the one in the attachment.

       

      My code:

       

      protected List<String> getLocalPages(final Resource completeEditionResource, final Edition edition, final int firstPage) throws Exception {
              PDDocument document = null;
              try {
                  final InputStream in = completeEditionResource.getInputStream();
                  document = PDDocument.load(in, MemoryUsageSetting.setupTempFileOnly());
              }
              PdfUtils.disableImageCache(document);
              return splitAndSavePages(document, firstPage, completeEditionResource, edition.getPublishedDate());
              } finally {
                  if (document != null) {
                      document.close();
                  }
                  completeEditionResource.getInputStream().reset();
              }
      }

       

      Exception thrown:

       

      java.io.IOException: Error: End-of-File, expected line
          at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1107)
          at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2650)
          at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2633)
          at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
          at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1230)
          at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1148)
          at com.flip.CompletePdfAnalyzer.getLocalPages(CompletePdfAnalyzer.java:162)

       

       

      I successfully downloaded the PDF using FileUtils.copyInputStreamToFile from Apache Commons-IO just before PDDocument.load to verify that the inputStream was correct.

       

      Attachments

        1. example.pdf
          2.62 MB
          Patrick Davila Kochan

        Activity

          People

            Unassigned Unassigned
            patrick_kochan Patrick Davila Kochan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: