[SPARK-3736] Workers should reconnect to Master if disconnected - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.0.2, 1.1.0
Fix Version/s: 1.2.0
Component/s: Spark Core
Labels:
None

Target Version/s:

1.2.0

Description

In standalone mode, when a worker gets disconnected from the master for some reason it never attempts to reconnect. In this situation you have to bounce the worker before it will reconnect to the master.

The preferred alternative is to follow what Hadoop does – when there's a disconnect, attempt to reconnect at a particular interval until successful (I think it repeats indefinitely every 10sec).

This has been observed by:

Attachments

Issue Links

duplicates

SPARK-1231 DEAD worker should recover automaticly

Closed

links to

[Github] Pull Request #2828 (mccheah)

Activity

People

Assignee:: Matthew Cheah

Reporter:: Andrew Ash

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 30/Sep/14 00:16

Updated:: 14/Nov/14 10:44

Resolved:: 20/Oct/14 18:35