Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2186

java.io.IOException: Catalog cannot be found

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.6, 1.8.7, 2.0.0
    • Fix Version/s: 1.8.7, 2.0.0
    • Component/s: Parsing
    • Labels:
      None

      Description

      I get this with the attached file:

      Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.PDFParser parseXrefTable
      Warnung: invalid xref line: 0000000000 65535    f
      Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.PDFParser parseXrefTable
      Warnung: Count in xref table is 0 at offset 334372
      Jul 04, 2014 5:41:00 PM org.apache.pdfbox.pdfparser.NonSequentialPDFParser initi
      alParse
      Warnung: Expected trailer object at position 334373, keep trying
      Exception in thread "main" java.io.IOException: Catalog cannot be found
              at org.apache.pdfbox.cos.COSDocument.getCatalog(COSDocument.java:522)
              at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSe
      quentialPDFParser.java:482)
              at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentia
      lPDFParser.java:757)
              at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1157)
      
              at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:197)
              at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:89)
      
      

      The cause is a TAB in an xref line. The solution is to search for a backslash s regex instead of a space only.

      I'm not touching the preflight parser (who has the same code line) because I assume that he should not be lenient.

      Source of the file:
      http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/002.zip file 959

        Attachments

        1. PDFBOX-2186.pdf
          332 kB
          Tilman Hausherr

          Activity

            People

            • Assignee:
              tilman Tilman Hausherr
              Reporter:
              tilman Tilman Hausherr
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: