Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2209

TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.23.0
    • None
    • None
    • None
    • hadoop version: 0.19.1

    Description

      If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat hang for several minutes when localizing the job. The jstack of related threads are as follows:

      "TaskLauncher for task" daemon prio=10 tid=0x0000002b05ee5000 nid=0x1adf runnable [0x0000000042e56000]
         java.lang.Thread.State: RUNNABLE
              at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
              at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
              at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
              at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
              - locked <0x0000002afc892ec8> (a sun.nio.ch.Util$1)
              - locked <0x0000002afc892eb0> (a java.util.Collections$UnmodifiableSet)
              - locked <0x0000002afc8927d8> (a sun.nio.ch.EPollSelectorImpl)
              at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
              at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
              - locked <0x0000002afce26158> (a java.io.BufferedInputStream)
              at java.io.DataInputStream.readShort(DataInputStream.java:295)
              at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556)
              - locked <0x0000002afce26218> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
              - locked <0x0000002afce26218> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
              at java.io.DataInputStream.read(DataInputStream.java:83)
              at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
              at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
              at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
              at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
              at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214)
              at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195)
              at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824)
              - locked <0x0000002afce2d260> (a org.apache.hadoop.mapred.TaskTracker$RunningJob)
              at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745)
              at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103)
              at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710)
      
      "Map-events fetcher for all reduce tasks on tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 tid=0x0000002b05ef8000 
      nid=0x1ada waiting for monitor entry [0x0000000042d55000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582)
              - waiting to lock <0x0000002afce2d260> (a org.apache.hadoop.mapred.TaskTracker$RunningJob)
              at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617)
              - locked <0x0000002a9eefe1f8> (a java.util.TreeMap)
      
      
      "IPC Server handler 2 on 50050" daemon prio=10 tid=0x0000002b050eb000 nid=0x1ab0 waiting for monitor entry [0x000000004234b000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684)
              - waiting to lock <0x0000002a9eefe1f8> (a java.util.TreeMap)
              - locked <0x0000002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker)
              at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
      
      "main" prio=10 tid=0x0000000040113800 nid=0x197d waiting for monitor entry [0x000000004022a000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1196)
              - waiting to lock <0x0000002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker)
              at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1068)
              at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1799)
              at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2898)
      

      Attachments

        1. MAPREDUCE-2209.patch
          2 kB
          Subroto Sanyal
        2. 2209-1.diff
          1 kB
          Liyin Liang

        Activity

          People

            liangly Liyin Liang
            liangly Liyin Liang
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: