Hadoop Common
  1. Hadoop Common
  2. HADOOP-4780

Task Tracker burns a lot of cpu in calling getLocalCache

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.19.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      make DistributedCache remember the size of each cache directory

      Description

      I noticed that many times, a task tracker max up to 6 cpus.
      During that time, iostat shows majority of that was system cpu.
      That situation can last for quite long.
      During that time, I saw a number of threads were in the following state:

      java.lang.Thread.State: RUNNABLE
      at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
      at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
      at java.io.File.exists(File.java:733)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
      at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
      at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)

      I suspect that getLocalCache is too expensive.
      And calling it for every task initialization seems too much waste.

      1. Hadoop-4780-2.patch
        7 kB
        He Yongqiang
      2. 4780-2v19.patch
        7 kB
        Chris Douglas

        Issue Links

          Activity

          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s mapred [ 12310690 ]
          Chris Douglas made changes -
          Issue Type Improvement [ 4 ] Bug [ 1 ]
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Chris Douglas made changes -
          Attachment 4780-2v19.patch [ 12403745 ]
          Zheng Shao made changes -
          Issue Type Bug [ 1 ] Improvement [ 4 ]
          dhruba borthakur made changes -
          Link This issue is duplicated by HADOOP-5244 [ HADOOP-5244 ]
          dhruba borthakur made changes -
          Fix Version/s 0.19.2 [ 12313650 ]
          He Yongqiang made changes -
          Fix Version/s 0.19.1 [ 12313473 ]
          Status Open [ 1 ] Patch Available [ 10002 ]
          He Yongqiang made changes -
          Attachment Hadoop-4780-2.patch [ 12396156 ]
          He Yongqiang made changes -
          Attachment Hadoop-4780 [ 12396075 ]
          He Yongqiang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          He Yongqiang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          He Yongqiang made changes -
          Attachment Hadoop-4780 [ 12396075 ]
          He Yongqiang made changes -
          Attachment Hadoop-4780.patch [ 12395996 ]
          He Yongqiang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          He Yongqiang made changes -
          Assignee he yongqiang [ he yongqiang ]
          He Yongqiang made changes -
          Fix Version/s 0.19.1 [ 12313473 ]
          Status Open [ 1 ] Patch Available [ 10002 ]
          He Yongqiang made changes -
          Attachment Hadoop-4780.patch [ 12395996 ]
          He Yongqiang made changes -
          Attachment Hadoop-4780 [ 12395821 ]
          He Yongqiang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          He Yongqiang made changes -
          Release Note Modified FileUtil.getDU to exec the du shell command. make DistributedCache remember the size of each cache directory
          Zheng Shao made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Zheng Shao made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          He Yongqiang made changes -
          Attachment Hadoop-4780 [ 12395821 ]
          He Yongqiang made changes -
          Attachment 4780-2.patch [ 12395804 ]
          He Yongqiang made changes -
          Attachment 4780-2.patch [ 12395804 ]
          He Yongqiang made changes -
          Attachment 4780.patch [ 12395528 ]
          He Yongqiang made changes -
          Attachment 4780-2.patch [ 12395739 ]
          He Yongqiang made changes -
          Attachment 4780-2.patch [ 12395739 ]
          He Yongqiang made changes -
          Affects Version/s 0.19.0 [ 12313211 ]
          Release Note Modified FileUtil.getDU to exec the du shell command.
          Affects Version/s 0.18.2 [ 12313424 ]
          Status Open [ 1 ] Patch Available [ 10002 ]
          He Yongqiang made changes -
          Attachment 4780.patch [ 12395528 ]
          He Yongqiang made changes -
          Attachment FiltUtil.patch [ 12395474 ]
          He Yongqiang made changes -
          Attachment DistributedCache.patch [ 12395473 ]
          He Yongqiang made changes -
          Attachment FiltUtil.patch [ 12395474 ]
          Attachment DistributedCache.patch [ 12395473 ]
          He Yongqiang made changes -
          Attachment FiltUtil.patch [ 12395472 ]
          He Yongqiang made changes -
          Attachment DistributedCache.patch [ 12395471 ]
          He Yongqiang made changes -
          Attachment FiltUtil.patch [ 12395472 ]
          Attachment DistributedCache.patch [ 12395471 ]
          Runping Qi made changes -
          Field Original Value New Value
          Description
          I noticed that many times, a task tracker max up to 6 cpus.
          During that time, iostat shows majority of that was system cpu.
          That situation can last for quite long.
          During that time, I saw a number of threads were in the following state:

            java.lang.Thread.State: RUNNABLE
                  at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
                  at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
                  at java.io.File.exists(File.java:733)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
                  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)

          I suspect that getLocalCache is too expensive.
          And calling it for every task initialization seems too much waste.

          I noticed that many times, a task tracker max up to 6 cpus.
          During that time, iostat shows majority of that was system cpu.
          That situation can last for quite long.
          During that time, I saw a number of threads were in the following state:

            java.lang.Thread.State: RUNNABLE
                  at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
                  at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
                  at java.io.File.exists(File.java:733)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
                  at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
                  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)

          I suspect that getLocalCache is too expensive.
          And calling it for every task initialization seems too much waste.

          Affects Version/s 0.18.2 [ 12313424 ]
          Runping Qi created issue -

            People

            • Assignee:
              He Yongqiang
              Reporter:
              Runping Qi
            • Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development