Description
I have several journal PDFs where the last xref section starts like
endobj xref
0 92
0000000000 65535 f
0000000044 00000 n
in this cases the PDF parser reads the endobj line completely and unreads " xref".
However the newline (in this case ^D) is lost. This is already documented in the
method readline() within PDFParser:
"Note: if you later unread the results of this function, you'll
need to add a newline character to the end of the string."
Currently I get an error like: "expected='obj' actual='655'" because the 'xref' is read as 'xref0'.
The fix:
in PDFParser insert before line 579 (the unreading of trailing characters after 'endobj') the lines:
// add a space first in place of the newline consumed by readline()
pdfSource.unread( SPACE_BYTE );
thus we get:
if (endObjectKey.startsWith( "endobj" ) )
Attachments
Issue Links
- duplicates
-
PDFBOX-818 PDFParser fails if object/xref starts at same line as endobj of a stream object
- Closed