Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1461

Corner-case deadlock in TaskTracker

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.12.3
    • 0.14.0
    • None
    • None

    Description

      Thanks to Koji for the attached stack-trace...

      Summary:

      main()
      -> offerService()
      -> markUnresponsiveTasks (locks the TaskTracker here)
      -> purgeTask()
      -> removeTaskFromJob (waiting to lock the RunningJob object)

      taskCleanup
      -> purgeJob (locks the RunningJob object)
      -> TIP.jobHasFinished()
      -> TIP.cleanup (waiting to lock the TaskTracker)

      -

      Clear-case of ordering issues during synchronization... it's a corner-case since it depends on the child-vm getting unresponsive and the cleanup thread kicking in; which is why I'm marking this for 0.14.0 rather than 0.13.0 - what do others think about this?

      -

      Two possible solutions to break the deadlock cycle:

      a) Make TaskTracker.purgeJob a synchronized method, thus it locks the TaskTracker before locking the RunningJob method.
      b) Make the TaskTracker.tasks map a Collections.synchronizedMap, thus doing away with the need to lock the TaskTracker in TIP.cleanup

      I'd prefer a) since the TaskTracker.tasks is referenced in multiple places in synchronized methods... and hence is a less intrusive change.

      Thoughts?

      Attachments

        1. HADOOP-1461_1_20070605.patch
          0.6 kB
          Arun Murthy
        2. main_taskcleanup_deadlock.txt
          2 kB
          Arun Murthy

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            acmurthy Arun Murthy
            acmurthy Arun Murthy
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment