Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3923

Expected a long type at offset 52152, instead got 'xref'

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.0.7
    • Fix Version/s: 2.0.8, 3.0.0 PDFBox
    • Component/s: Parsing
    • Labels:
      None
    • Environment:
      Java 1.8.0_144-b01

      Description

      This reads as a duplicate of PDFBOX-2441, PDFBOX-3179 and several others marked as resolved in 2.0.0 however this bug is reproducible in PDFBOX 2.0.0 as well as PDFBOX 2.0.7.

      The attached PDF file is parsable by Chrome (PDFium), Mozilla (pdf.js), Edge, Windows 10 Reader and Adobe Acrobat but fails using PDFBOX 2.0.0 and PDFBOX 2.0.7 with the following error.

      Exception in thread "main" java.io.IOException: Error: Expected a long type at offset 52152, instead got 'xref'
              at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1358)
              at org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1286)
              at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:760)
              at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:742)
              at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:673)
              at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:633)
              at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:241)
              at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276)
              at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1011)
              at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:949)
              at org.apache.pdfbox.tools.PrintPDF.main(PrintPDF.java:140)
              at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:72)
      Caused by: java.lang.NumberFormatException: For input string: "xref"
              at java.lang.NumberFormatException.forInputString(Unknown Source)
              at java.lang.Long.parseLong(Unknown Source)
              at java.lang.Long.parseLong(Unknown Source)
              at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1353)
              ... 11 more
      

      We do not generate this PDF file so we are unaware of the origin but the creator has given permission to share this file publicly for troubleshooting purposes. We can ask any questions to the creator upon request.

        Attachments

        1. P17090403580.pdf
          52 kB
          Tres Finocchiaro

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                tresf Tres Finocchiaro
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: