An integer is being used to calculate file offsets for COS objects. This works fine for small PDF files, but breaks when the PDF file is larger than 2G. For many large files (136 out of 216 in my sample set), negative file offsets are generated for some of the COS objects due to integer overflow. This results in an IOException being thrown in COSParser.java at line 728. Note that these negative offsets are not valid object stream references.
I have fixed the problem in my local copy of the code by modifying PDFXrefStreamParser.java starting at line 158.
I can submit a sample PDF file if desired (it will be more than 2G in size)