Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4728

Interaction between oob heartbeats and damper can cause TT to heartbeat with zero delay

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When mapreduce.tasktracker.outofband.heartbeat is true and mapreduce.tasktracker.outofband.heartbeat.damper is something largish (like the default of 1000000), the TT doesn't wait for tasks to finish before heartbeating back to the JT. This causes excessive load on the JT which in-turn reduces overall cluster performance.

      I believe the problem is that in the following block of code, when getHeartbeatInterval() returns 0, we heartbeat back immediately BUT finishedCount does not get reset. It looks like nothing ever gets us out of this situation so we basically heartbeat without ever sleeping.

              // accelerate to account for multiple finished tasks up-front
              long remaining =
                (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now;
              while (remaining > 0) {
                // sleeps for the wait time or
                // until there are *enough* empty slots to schedule tasks
                synchronized (finishedCount) {
                  finishedCount.wait(remaining);
      
                  // Recompute
                  now = System.currentTimeMillis();
                  remaining =
                    (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now;
      
                  if (remaining <= 0) {
                    // Reset count
                    finishedCount.set(0);
                    break;
                  }
                }
              }
      
      

        Activity

        Hide
        Nathan Roberts added a comment -

        quick patch illustrating possible approach

        Show
        Nathan Roberts added a comment - quick patch illustrating possible approach
        Hide
        Thomas Graves added a comment -

        this might be a dup of MAPREDUCE-4478

        Show
        Thomas Graves added a comment - this might be a dup of MAPREDUCE-4478

          People

          • Assignee:
            Unassigned
            Reporter:
            Nathan Roberts
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development