Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-547

ReduceTaskRunner can miss sending hearbeats if no map output copy finishes within "mapred.task.timeout"

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.6.2
    • 0.7.0
    • None
    • None

    Description

      In ReduceTaskRunner, main loop sending heartbeats waits on copyResults, which releases only if a copy thread finishes copying. This can cause good reduce tasks which are copying data to fail, if no map task output was copied within "mapred.task.timeout".

      ReduceTaskRunner.java:490
      try

      { copyResults.wait(); <=========== Calls unconditional wait. }

      catch (InterruptedException e) { }

      wait() should be with a timeout, possibly taskTimeout/2 after which it should send a hearbeat and go back to wait.

      Attachments

        1. Hadoop-547.patch
          3 kB
          Sanjay Dahiya
        2. Hadoop-547_1.patch
          0.7 kB
          Sanjay Dahiya

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sanjay.dahiya Sanjay Dahiya
            sanjay.dahiya Sanjay Dahiya
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment