Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-498

some pdf-files have no newline after endobj, pdfbox fails on that

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7.3
    • 0.8.0-incubator
    • Parsing
    • None

    Description

      We have some pdf-documents that have no newline after some endobj-instructions, but the next object number. Just like in PDFBOX-195. PDFBox throws an IOException when it encouters such a situation.
      Stacktrace:
      java.io.IOException: expected='endobj' firstReadAttempt='endobj28' secondReadAttempt='0' org.apache.pdfbox.io.PushBackInputStream@a37368
      at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:534)
      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:167)
      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:630)
      at org.apache.pdfbox.pdfparser.TestPDFParser.testParsingTroublePDFs(TestPDFParser.java:98)

      Attachments

        1. PDFParser.java.diff
          1 kB
          Daan de Wit
        2. endobj-no-newline.pdf
          502 kB
          Daan de Wit

        Activity

          People

            Unassigned Unassigned
            d.de.wit Daan de Wit
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: