Description
Need to search for a correct xref start address
Example file:
http://digitalcorpora.org/corp/nps/files/govdocs1/020/020747.pdf
Exception in thread "main" java.io.IOException: Error: Expected an integer type, actual='ref'
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)
Using the code:
PDFTextStripper ts = new PDFTextStripper();
PrintWriter out = new PrintWriter(new FileWriter(new File (pFile+".txt")));
RandomAccess scratchFile = new RandomAccessFile(File.createTempFile("pdfbox-", ".tmp"), "rw");
PDDocument doc = PDDocument.loadNonSeq(new File(pFile), scratchFile)
ts.setForceParsing(true);
ts.writeText(doc, out);
Related: PDFBOX-1757
Attachments
Issue Links
- is depended upon by
-
PDFBOX-1541 expected='endstream' actual='' failure to parse
- Closed
-
PDFBOX-1812 Illegal characters in XML output
- Closed
-
PDFBOX-1818 Push back buffer is full error
- Closed