Description
Hi,
I encountered an issue in a production environment that cause a disk full error.
Tika uses the PDFParser with the "forceParsing" boolean set to true in order to continue the parsing even if an error occurs.
Two PDFs have an object number greater than the max int value so the readInt() method fails.
Due to the "forceParsing" boolean, the parser try to go to the next object but it can't because on error the readInt method backtrack the read bytes and so
the "skipToNextObj" method does nothing and we try to parse the same object indefinitely...
The COSObjectKey object already uses a long as object numder, so we should read a long instead of an integer during the parsing process using a "readLong" method to manage too large objects numbers.
Are you agreed with that ?
BR,
Eric