[SPARK-4759] Deadlock in complex spark job in local mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.1.1, 1.2.0, 1.3.0
Fix Version/s: 1.1.2, 1.2.1, 1.3.0
Component/s: Spark Core
Labels:
- backport-needed
Environment:

Java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

Mac OSX 10.10.1
Using local spark context

Target Version/s:

1.1.2, 1.2.1, 1.3.0

Description

The attached test class runs two identical jobs that perform some iterative computation on an RDD[(Int, Int)]. This computation involves

taking new data merging it with the previous result
caching and checkpointing the new result
rinse and repeat

The first time the job is run, it runs successfully, and the spark context is shut down. The second time the job is run with a new spark context in the same process, the job hangs indefinitely, only having scheduled a subset of the necessary tasks for the final stage.

Ive been able to produce a test case that reproduces the issue, and I've added some comments where some knockout experimentation has left some breadcrumbs as to where the issue might be.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SparkBugReplicator.scala
05/Dec/14 07:53
2 kB
Davis Shepherd

Issue Links

links to

[Github] Pull Request #3633 (andrewor14)

Activity

People

Assignee:: Andrew Or

Reporter:: Davis Shepherd

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 05/Dec/14 07:43

Updated:: 21/Jan/15 18:49

Resolved:: 21/Jan/15 18:49