[PDFBOX-536] missing iterator.hasNext() test in PDFXrefStreamParser - ASF JIRA

XML

Word

Printable

JSON

The class: org.apache.pdfbox.pdfparser.PDFXrefStreamParser

uses an unbounded iterator in it's parser method.

Specifically, line 100 should be changed from:

while(pdfSource.available() > 0)

while(pdfSource.available() > 0 && objIter.hasNext())

Not having this check causes line 115 to blow up with a NoSuchElementException.

I will attach a test file that triggers the problem (during Text extraction) and also a patched version of PDFXrefStreamParser.java.

relates to

PDFBOX-533 PDFTextStripper.writeCharacters is called no where in the class