[HADOOP-229] hadoop cp should generate a better number of map tasks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.3.0
Component/s: fs
Labels:
None

Description

hadoop cp currently assigns 10 files to copy per map task.
in case of a small number of large files on a large cluster (say 300 files of 30GB each on a 300 node cluster), this results in long execution times.
better would be to assign files per task such that the entire cluster is utilized: one file per map, with a cap of 10000 maps total, so as not to over burden the job tracker.

Attachments

Activity

People

Assignee:: Milind Barve

Reporter:: Yoram Arnon

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 18/May/06 07:22

Updated:: 03/Aug/06 17:46

Resolved:: 20/May/06 02:53