Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.0.2, 1.1.0
-
None
Description
In standalone mode, when a worker gets disconnected from the master for some reason it never attempts to reconnect. In this situation you have to bounce the worker before it will reconnect to the master.
The preferred alternative is to follow what Hadoop does – when there's a disconnect, attempt to reconnect at a particular interval until successful (I think it repeats indefinitely every 10sec).
This has been observed by:
- pkolaczk in http://apache-spark-user-list.1001560.n3.nabble.com/Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td6240.html
- romi-totango in http://apache-spark-user-list.1001560.n3.nabble.com/Re-Workers-disconnected-from-master-sometimes-and-never-reconnect-back-td15335.html
- aash
Attachments
Issue Links
- duplicates
-
SPARK-1231 DEAD worker should recover automaticly
- Closed
- links to