Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
If eventHandlingThread handles an event which causes it to throw an exception (e.g. if it is unable to flush an event to HDFS), the thread dies. The events are enqueued and eventually handled when JobHistoryEventHandler stops. If handling these events also throws an exception, the remaining events are lost. This can for example cause moving job history files to mapreduce.jobhistory.done-dir to not occur.
There should be some fail-proof logic here to prevent these events from being lost. Should also be careful that the same exception is not thrown for each event to prevent the logs from being cluttered with the same stacktrace. Perhaps we can set a configurable number of failed handleEvent calls before finally giving up a clean shutdown.