Created attachment 27776 [details] Patch of files to get AIOOB stack trace out I have a collection of files that when being processed via Tika will encounter a RuntimeException stemming from a ArrayIndexOutofBoundsException with no stack trace or Cause information. The root AIOOB issue was fixed for many of these files in the daily build after the 3.8-b4 release, but was re-introduced by the HWPF rework done in revision 1178083. (The files are successful with revision 1178063) The files that produce this error are confirmed as valid with the MS's bffvalidator utility. Due to the sensitive nature of their contents, I'm not able to include them as examples. I may be able to get a coworker to help me hex-edit out the sensitive info so that I can include them at a later point to help fix the root issue causing the AIOOB. In the meantime , I was able to pinpoint the location where the mysterious AIOOB exception was spawning from. It is originating out of ListLevel() when performing a LittleEndian.getShort(). The patch I've included does two things: * Logs a warning message when the buffers bounds are about to be over blown. * Catches and rethrows the AIOOB exception from the getShort() call as a RuntimeException, so that additional stack trace information is available. (At least it is now visible in Tika) The patch should help highlight future documents that encounter this problem and at least make it easier with tracking down and making the fix to get them to work in the future.
Created attachment 27806 [details] Hex-edited Sample test file to blow AIOOB #1 Test case file 1... Hexedited original to remove sensitive information. Post hex-editing first attempt produced stack trace, subsequent runs did not. This file worked with revision 1178063
Created attachment 27807 [details] Hex-edited Sample test file to blow AIOOB #2 Test case file 2... Hexedited original to remove sensitive information. This file worked with revision 1178063
Created attachment 27808 [details] Hex-edited Sample test file to blow AIOOB #3 Test case file 3... Hexedited original to remove sensitive information. Post hex-editing first attempt produced stack trace, subsequent runs did not. This file worked with revision 1178063
shall be fixed in revision 1195077 Please, check.
Fix appears to have resolved issue for 90% of files. Will try to get examples of the remaining cases and open a new bug if those files are valid and still failing. Thanks for the fix!!