Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.0, 1.0.0
-
None
-
None
Description
In deploy.master.Master, the completeRecovery method is the last thing to be called when a standalone Master is recovering from failure. It is responsible for resetting some state, relaunching drivers, and eventually resuming its scheduling duties.
There are currently four places in Master.scala where completeRecovery is called. Three of them are from within the actor's receive method, and aren't problems. The last starts from within receive when the ElectedLeader message is received, but the actual completeRecovery() call is made from the Akka scheduler. That means that it will execute on a different scheduler thread, and Master itself will end up running (i.e., schedule() ) from that Akka scheduler thread. Among other things, that means that uncaught exception handling will be different – https://issues.apache.org/jira/browse/SPARK-1620