[HDFS-202] Add a bulk FIleSystem.getFileBlockLocations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.22.0
Component/s: hdfs-client, namenode
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed

Description

Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
The downsides are multiple:

Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.

It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.

When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hdfsListFiles5.patch
11/Aug/10 19:32
48 kB
Hairong Kuang
hdfsListFiles4.patch
11/Aug/10 18:20
47 kB
Hairong Kuang
hdfsListFiles3.patch
02/Aug/10 22:30
54 kB
Hairong Kuang
hdfsListFiles2.patch
31/Jul/10 00:07
42 kB
Hairong Kuang
hdfsListFiles1.patch
27/Jul/10 04:42
40 kB
Hairong Kuang
hdfsListFiles.patch
23/Jul/10 19:03
43 kB
Hairong Kuang

Issue Links

blocks

MAPREDUCE-1981 Improve getSplits performance by using listLocatedStatus

Closed

relates to

HADOOP-6870 Add FileSystem#listLocatedStatus to list a directory's content together with each file's block locations

Closed

Activity

People

Assignee:: Hairong Kuang

Reporter:: Arun Murthy

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 08/May/09 17:21

Updated:: 12/Dec/11 06:20

Resolved:: 11/Aug/10 20:46