Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4759

Deadlock in complex spark job in local mode

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.1, 1.2.0, 1.3.0
    • Fix Version/s: 1.1.2, 1.2.1, 1.3.0
    • Component/s: Spark Core
    • Labels:
    • Environment:

      Java version "1.7.0_51"
      Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
      Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

      Mac OSX 10.10.1
      Using local spark context

      Description

      The attached test class runs two identical jobs that perform some iterative computation on an RDD[(Int, Int)]. This computation involves

      1. taking new data merging it with the previous result
      2. caching and checkpointing the new result
      3. rinse and repeat

      The first time the job is run, it runs successfully, and the spark context is shut down. The second time the job is run with a new spark context in the same process, the job hangs indefinitely, only having scheduled a subset of the necessary tasks for the final stage.

      Ive been able to produce a test case that reproduces the issue, and I've added some comments where some knockout experimentation has left some breadcrumbs as to where the issue might be.

        Attachments

          Activity

            People

            • Assignee:
              andrewor14 Andrew Or
              Reporter:
              dgshep Davis Shepherd

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment