[MAPREDUCE-1213] TaskTrackers restart is very slow because it deletes distributed cache directory synchronously - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.20.1
Fix Version/s: 0.21.0
Component/s: None
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed
Release Note:
Directories specified in mapred.local.dir that can not be created now cause the TaskTracker to fail to start.

Description

We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-1213.1.patch
10/Dec/09 22:56
14 kB
Zheng Shao
MAPREDUCE-1213.2.patch
11/Dec/09 01:31
14 kB
Zheng Shao
MAPREDUCE-1213.3.patch
15/Dec/09 01:00
14 kB
Zheng Shao
MAPREDUCE-1213.4.patch
15/Dec/09 20:04
14 kB
Zheng Shao
MAPREDUCE-1213.branch-0.20.patch
12/Jan/10 23:09
16 kB
Zheng Shao
MAPREDUCE-1213.branch-0.20.2.patch
14/Jan/10 22:26
14 kB
Zheng Shao

Issue Links

blocks

MAPREDUCE-1303 Merge org.apache.hadoop.mapred.CleanupQueue with MRAsyncDiskService

Open

MAPREDUCE-1302 TrackerDistributedCacheManager can delete file asynchronously

Closed

breaks

MAPREDUCE-4481 User Log Retention across TT restarts

Reopened

is blocked by

HADOOP-6433 Add AsyncDiskService that is used in both hdfs and mapreduce

Closed

is related to

HDFS-611 Heartbeats times from Datanodes increase when there are plenty of blocks to delete

Closed

relates to

MAPREDUCE-2049 JT and TT should prune invalid local dirs on startup

Resolved

MAPREDUCE-1382 MRAsyncDiscService should tolerate missing local.dir

Closed

(2 relates to)

Activity

People

Assignee:: Zheng Shao

Reporter:: Dhruba Borthakur

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 13/Nov/09 11:47

Updated:: 01/Aug/12 13:23

Resolved:: 17/Dec/09 02:50