Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 1.0.1, 1.0.2, 1.0.3
    • Fix Version/s: 1.1.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixed a bug in TaskTracker's heartbeat to keep it under control.
    • Target Version/s:
    1. MAPREDUCE-4478.patch
      0.7 kB
      Suresh Srinivas
    2. 4478.diff
      0.8 kB
      Liyin Liang

      Issue Links

        Activity

        Hide
        Liyin Liang added a comment -

        There are two configuration items to control the TaskTracker's heartbeat interval. One is mapreduce.tasktracker.outofband.heartbeat. The other is mapreduce.tasktracker.outofband.heartbeat.damper. If we set mapreduce.tasktracker.outofband.heartbeat with true and set mapreduce.tasktracker.outofband.heartbeat.damper with default value (1000000), TaskTracker may send heartbeat without any interval.

        The code to control heartbeat interval is as follows:

        long now = System.currentTimeMillis();
                
                // accelerate to account for multiple finished tasks up-front
                long remaining = 
                  (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now;
                while (remaining > 0) {
                  // sleeps for the wait time or 
                  // until there are *enough* empty slots to schedule tasks
                  synchronized (finishedCount) {
                    finishedCount.wait(remaining);
                    
                    // Recompute
                    now = System.currentTimeMillis();
                    remaining = 
                      (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now;
                    
                    if (remaining <= 0) {
                      // Reset count 
                      finishedCount.set(0);
                      break;
                    }
                  }
                }
        

        During the first time computing, if finishedCount is more than zero, getHeartbeatInterval(finishedCount.get()) will return zero. Then remaining will be less than or equal with zero. In this case, the while loop will be skipped. So finishedCount will never be set with zero.

        Show
        Liyin Liang added a comment - There are two configuration items to control the TaskTracker's heartbeat interval. One is mapreduce.tasktracker.outofband.heartbeat . The other is mapreduce.tasktracker.outofband.heartbeat.damper . If we set mapreduce.tasktracker.outofband.heartbeat with true and set mapreduce.tasktracker.outofband.heartbeat.damper with default value (1000000), TaskTracker may send heartbeat without any interval. The code to control heartbeat interval is as follows: long now = System .currentTimeMillis(); // accelerate to account for multiple finished tasks up-front long remaining = (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now; while (remaining > 0) { // sleeps for the wait time or // until there are *enough* empty slots to schedule tasks synchronized (finishedCount) { finishedCount.wait(remaining); // Recompute now = System .currentTimeMillis(); remaining = (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now; if (remaining <= 0) { // Reset count finishedCount.set(0); break ; } } } During the first time computing, if finishedCount is more than zero, getHeartbeatInterval(finishedCount.get()) will return zero. Then remaining will be less than or equal with zero. In this case, the while loop will be skipped. So finishedCount will never be set with zero.
        Hide
        Liyin Liang added a comment -

        Attach a patch to fix this bug. I don't know whether the synchronized is necessary.

        Show
        Liyin Liang added a comment - Attach a patch to fix this bug. I don't know whether the synchronized is necessary.
        Hide
        Luke Lu added a comment -

        The synchronized is not necessary is finishedCount is already an AtomicInteger. Otherwise the patch lgtm.

        Show
        Luke Lu added a comment - The synchronized is not necessary is finishedCount is already an AtomicInteger. Otherwise the patch lgtm.
        Hide
        Suresh Srinivas added a comment -

        Updated patch that addresses Luke's comment.

        Show
        Suresh Srinivas added a comment - Updated patch that addresses Luke's comment.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Great catch Liyin! And patch looks good. Checking it in.

        Show
        Vinod Kumar Vavilapalli added a comment - Great catch Liyin! And patch looks good. Checking it in.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Just committed this to branch-1 and branch-1.1. Thanks Liyin!

        Thanks for the patch update, Suresh.

        Show
        Vinod Kumar Vavilapalli added a comment - Just committed this to branch-1 and branch-1.1. Thanks Liyin! Thanks for the patch update, Suresh.
        Hide
        Suresh Srinivas added a comment -

        I merged the change to branch-1.1 to be picked up for 1.1.2.

        Show
        Suresh Srinivas added a comment - I merged the change to branch-1.1 to be picked up for 1.1.2.
        Hide
        Suresh Srinivas added a comment -

        Realized it was already merged to branch-1.1. Thx Vinod for merging.

        Show
        Suresh Srinivas added a comment - Realized it was already merged to branch-1.1. Thx Vinod for merging.
        Hide
        Matt Foley added a comment -

        Closed upon successful release of 1.1.2.

        Show
        Matt Foley added a comment - Closed upon successful release of 1.1.2.

          People

          • Assignee:
            Liyin Liang
            Reporter:
            Liyin Liang
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development