Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1769

Fix crash on invalid xref

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.8.2
    • 1.8.4, 2.0.0
    • Parsing
    • None

    Description

      Need to search for a correct xref start address

      Example file:
      http://digitalcorpora.org/corp/nps/files/govdocs1/020/020747.pdf

      Exception in thread "main" java.io.IOException: Error: Expected an integer type, actual='ref'
      at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)

      Using the code:
      PDFTextStripper ts = new PDFTextStripper();
      PrintWriter out = new PrintWriter(new FileWriter(new File (pFile+".txt")));
      RandomAccess scratchFile = new RandomAccessFile(File.createTempFile("pdfbox-", ".tmp"), "rw");
      PDDocument doc = PDDocument.loadNonSeq(new File(pFile), scratchFile)
      ts.setForceParsing(true);
      ts.writeText(doc, out);

      Related: PDFBOX-1757

      Attachments

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              willp-bl William Palmer
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: