[HADOOP-4780] Task Tracker burns a lot of cpu in calling getLocalCache - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.19.0
Fix Version/s: 0.19.2
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed
Release Note:
make DistributedCache remember the size of each cache directory

Description

I noticed that many times, a task tracker max up to 6 cpus.
During that time, iostat shows majority of that was system cpu.
That situation can last for quite long.
During that time, I saw a number of threads were in the following state:

java.lang.Thread.State: RUNNABLE
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
at java.io.File.exists(File.java:733)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)

I suspect that getLocalCache is too expensive.
And calling it for every task initialization seems too much waste.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

4780-2v19.patch
26/Mar/09 22:30
7 kB
Christopher Douglas
Hadoop-4780-2.patch
16/Dec/08 04:14
7 kB
He Yongqiang

Issue Links

is duplicated by

HADOOP-5244 Distributed cache spends a lot of time runing du -s

Closed

Activity

People

Assignee:: He Yongqiang

Reporter:: Runping Qi

Votes:: 2 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 05/Dec/08 06:52

Updated:: 03/Aug/09 12:59

Resolved:: 26/Mar/09 22:32