Bug #52372 is one instance of a family of bugs that can come from either unsupported features / broken files and cause the parser to fail very hard. So we wrote https://github.com/lacostej/tika-hardener to ease identifying places in the code which can be improved to fail more gracefully. Feedback appreciated.
This is probably one for the Tika dev list If you find issues in the underlying POI code, which are present in the latest POI svn snapshots, please report them here. For general discussions on parser stability, Tika is the place to have the discussion
We will report individual issues as we find them. I just thought you might want to look at the code and maybe adapt it for use internally to the POI project. Not sure I will personally be able to be on the mailing list. I will ask some co-workers if they can. Feel free to add info to this issue as I will follow up here.