Wait, the exceptions that this change now catches & logs is in the decoding an OLE10 embedded entry (into its byte data), not in actually parsing of the resulting byte data. If the exception is hit later when we recurse into parseEmbedded, the exception is still thrown as before, so your custom AutoDetectParser will still see/handle the exception.
Hum you are right, I will still see exceptions from embedded docs. And this will improve parsing of the container.
But I think this is separately a good idea (an AutoDetectParser logging & continuing by default): is this something you could possibly contribute...?
I would like to, but I do not think my code has good quality. I think the meaning of "continuing" is application specific. My app has a Raw/Binary StringParser that uses heuristics to extract mixed ISO-8859-1, UTF-8 and UTF-16 strings from unknown files. It is the fallBackParser and it is also called when some exception is thrown by a corrupted doc. I could upload both, but they need a lot of enhacements.
Do you have an example corrupted document? We could test before/after this change and see.
Not of the kind you have, but now i see the parsing will be better after this change.