Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
In YARN-7051, we ran into a case where the preemption monitor thread hung with no indication of why.
The preemption monitor is started by the SchedulingExecutorService from SchedulingMonitor#serviceStart. Once an uncaught throwable happens, nothing ever gets the result of the future, the thread running the preemption monitor never dies, and it never gets rescheduled.
If HadoopExecutor were used, it would at least provide a HadoopScheduledThreadPoolExecutor that logs the exception if one happens.