[SPARK-2019] Spark workers die/disappear when job fails for nearly any reason - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 0.9.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across our cluster. I don't think this should happen - the job failures are often not even that bad. There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails

We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to frequently restart the workers. When the sysadm is not around, there is nothing our devs can do.

Many thanks

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: sam

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/Jun/14 10:39

Updated:: 10/Oct/14 08:40

Resolved:: 06/Jun/14 00:23