Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0
-
None
Description
In `Master.removeWorker`, master clears executor and driver state, but does not clear app state. App state is cleared when received `UnregisterApplication` and when `onDisconnect`, the first is when driver shutdown gracefully, the second is called when `netty`'s `channelInActive` is called (which is called when channel is closed), both of which can not handle the case when there is a network partition between master and worker.
Follow the steps in SPARK-19900, and see the screenshots when worker1 partitions with master, the app `app-xxx-000` is still running instead of finished because of worker1 is down.
cc CodingCat