[MAPREDUCE-5907] Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: client
Labels:
None

Description

FileInputFormat (both mapreduce and mapred implementations) use recursive listing while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for object store based fs implementations like s3 and swift because every listStatus call ends up being a webservice call to backend. In cases where large number of files are considered for input, this makes getSplits() call slow.

This patch adds a new set of recursive list apis that gives opportunity to the fs implementations to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for objectstore based fs implementations it provides a simple change to include recursive flag as true (as shown in the patch) to improve listing performance.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-5907-3.patch
03/Jun/14 00:37
33 kB
Sumit Kumar
MAPREDUCE-5907-2.patch
02/Jun/14 16:15
33 kB
Sumit Kumar
MAPREDUCE-5907.patch
28/May/14 19:55
11 kB
Sumit Kumar

Issue Links

depends upon

HADOOP-10634 Add recursive list apis to FileSystem to give implementations an opportunity for optimization

Resolved

is depended upon by

HADOOP-14302 Test MR split optimisation with recursive listing

Open

HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features

Resolved

is related to

MAPREDUCE-7092 MR examples to work better against cloud stores

Resolved

Activity

People

Assignee:: Sumit Kumar

Reporter:: Sumit Kumar

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 28/May/14 19:54

Updated:: 05/Jan/22 17:16