Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-547

ReduceTaskRunner can miss sending hearbeats if no map output copy finishes within "mapred.task.timeout"

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.2
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      In ReduceTaskRunner, main loop sending heartbeats waits on copyResults, which releases only if a copy thread finishes copying. This can cause good reduce tasks which are copying data to fail, if no map task output was copied within "mapred.task.timeout".

      ReduceTaskRunner.java:490
      try

      { copyResults.wait(); <=========== Calls unconditional wait. }

      catch (InterruptedException e) { }

      wait() should be with a timeout, possibly taskTimeout/2 after which it should send a hearbeat and go back to wait.

        Attachments

        1. Hadoop-547.patch
          3 kB
          Sanjay Dahiya
        2. Hadoop-547_1.patch
          0.7 kB
          Sanjay Dahiya

          Activity

            People

            • Assignee:
              sanjay.dahiya Sanjay Dahiya
              Reporter:
              sanjay.dahiya Sanjay Dahiya
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: