Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5512

TaskTracker hung after failed reconnect to the JobTracker

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1-win, 1.3.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      TaskTracker hung after failed reconnect to the JobTracker.

      This is the problematic piece of code:

          this.distributedCacheManager = new TrackerDistributedCacheManager(
              this.fConf, taskController);
          this.distributedCacheManager.startCleanupThread();
          
          this.jobClient = (InterTrackerProtocol) 
          UserGroupInformation.getLoginUser().doAs(
              new PrivilegedExceptionAction<Object>() {
            public Object run() throws IOException {
              return RPC.waitForProxy(InterTrackerProtocol.class,
                  InterTrackerProtocol.versionID,
                  jobTrackAddr, fConf);
            }
          });
      

      In case RPC.waitForProxy() throws, TrackerDistributedCacheManager cleanup thread will never be stopped, and given that it is a non daemon thread it will keep TT up forever.

        Attachments

        1. hadoop-tasktracker-RD00155DD09100.log
          10 kB
          Ivan Mitic
        2. MAPREDUCE-5512.branch-1.patch
          6 kB
          Ivan Mitic
        3. tt_Hung.txt
          17 kB
          Ivan Mitic

          Activity

            People

            • Assignee:
              ivanmi Ivan Mitic
              Reporter:
              ivanmi Ivan Mitic
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: