Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 1.0.1, 1.0.2, 1.0.3
    • Fix Version/s: 1.1.2
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixed a bug in TaskTracker's heartbeat to keep it under control.
    1. 4478.diff
      0.8 kB
      Liyin Liang
    2. MAPREDUCE-4478.patch
      0.7 kB
      Suresh Srinivas

      Issue Links

        Activity

        Hide
        liangly Liyin Liang added a comment -

        There are two configuration items to control the TaskTracker's heartbeat interval. One is mapreduce.tasktracker.outofband.heartbeat. The other is mapreduce.tasktracker.outofband.heartbeat.damper. If we set mapreduce.tasktracker.outofband.heartbeat with true and set mapreduce.tasktracker.outofband.heartbeat.damper with default value (1000000), TaskTracker may send heartbeat without any interval.

        The code to control heartbeat interval is as follows:

        long now = System.currentTimeMillis();
                
                // accelerate to account for multiple finished tasks up-front
                long remaining = 
                  (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now;
                while (remaining > 0) {
                  // sleeps for the wait time or 
                  // until there are *enough* empty slots to schedule tasks
                  synchronized (finishedCount) {
                    finishedCount.wait(remaining);
                    
                    // Recompute
                    now = System.currentTimeMillis();
                    remaining = 
                      (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now;
                    
                    if (remaining <= 0) {
                      // Reset count 
                      finishedCount.set(0);
                      break;
                    }
                  }
                }
        

        During the first time computing, if finishedCount is more than zero, getHeartbeatInterval(finishedCount.get()) will return zero. Then remaining will be less than or equal with zero. In this case, the while loop will be skipped. So finishedCount will never be set with zero.

        Show
        liangly Liyin Liang added a comment - There are two configuration items to control the TaskTracker's heartbeat interval. One is mapreduce.tasktracker.outofband.heartbeat . The other is mapreduce.tasktracker.outofband.heartbeat.damper . If we set mapreduce.tasktracker.outofband.heartbeat with true and set mapreduce.tasktracker.outofband.heartbeat.damper with default value (1000000), TaskTracker may send heartbeat without any interval. The code to control heartbeat interval is as follows: long now = System .currentTimeMillis(); // accelerate to account for multiple finished tasks up-front long remaining = (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now; while (remaining > 0) { // sleeps for the wait time or // until there are *enough* empty slots to schedule tasks synchronized (finishedCount) { finishedCount.wait(remaining); // Recompute now = System .currentTimeMillis(); remaining = (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now; if (remaining <= 0) { // Reset count finishedCount.set(0); break ; } } } During the first time computing, if finishedCount is more than zero, getHeartbeatInterval(finishedCount.get()) will return zero. Then remaining will be less than or equal with zero. In this case, the while loop will be skipped. So finishedCount will never be set with zero.
        Hide
        liangly Liyin Liang added a comment -

        Attach a patch to fix this bug. I don't know whether the synchronized is necessary.

        Show
        liangly Liyin Liang added a comment - Attach a patch to fix this bug. I don't know whether the synchronized is necessary.
        Hide
        vicaya Luke Lu added a comment -

        The synchronized is not necessary is finishedCount is already an AtomicInteger. Otherwise the patch lgtm.

        Show
        vicaya Luke Lu added a comment - The synchronized is not necessary is finishedCount is already an AtomicInteger. Otherwise the patch lgtm.
        Hide
        sureshms Suresh Srinivas added a comment -

        Updated patch that addresses Luke's comment.

        Show
        sureshms Suresh Srinivas added a comment - Updated patch that addresses Luke's comment.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Great catch Liyin! And patch looks good. Checking it in.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Great catch Liyin! And patch looks good. Checking it in.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Just committed this to branch-1 and branch-1.1. Thanks Liyin!

        Thanks for the patch update, Suresh.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Just committed this to branch-1 and branch-1.1. Thanks Liyin! Thanks for the patch update, Suresh.
        Hide
        sureshms Suresh Srinivas added a comment -

        I merged the change to branch-1.1 to be picked up for 1.1.2.

        Show
        sureshms Suresh Srinivas added a comment - I merged the change to branch-1.1 to be picked up for 1.1.2.
        Hide
        sureshms Suresh Srinivas added a comment -

        Realized it was already merged to branch-1.1. Thx Vinod for merging.

        Show
        sureshms Suresh Srinivas added a comment - Realized it was already merged to branch-1.1. Thx Vinod for merging.
        Hide
        mattf Matt Foley added a comment -

        Closed upon successful release of 1.1.2.

        Show
        mattf Matt Foley added a comment - Closed upon successful release of 1.1.2.

          People

          • Assignee:
            liangly Liyin Liang
            Reporter:
            liangly Liyin Liang
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development