[HDFS-985] HDFS should issue multiple RPCs for listing a large directory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.21.0
Component/s: None
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed

Description

Currently HDFS issues one RPC from the client to the NameNode for listing a directory. However some directories are large that contain thousands or millions of items. Listing such large directories in one RPC has a few shortcomings:
1. The list operation holds the global fsnamesystem lock for a long time thus blocking other requests. If a large number (like thousands) of such list requests hit NameNode in a short period of time, NameNode will be significantly slowed down. Users end up noticing longer response time or lost connections to NameNode.
2. The response message is uncontrollable big. We observed a response as big as 50M bytes when listing a directory of 300 thousand items. Even with the optimization introduced at ~~HDFS-946~~ that may be able to cut the response by 20-50%, the response size will still in the magnitude of 10 mega bytes.

I propose to implement a directory listing using multiple RPCs. Here is the plan:
1. Each getListing RPC has an upper limit on the number of items returned. This limit could be configurable, but I am thinking to set it to be a fixed number like 500.
2. Each RPC additionally specifies a start position for this listing request. I am thinking to use the last item of the previous listing RPC as an indicator. Since NameNode stores all items in a directory as a sorted array, NameNode uses the last item to locate the start item of this listing even if the last item is deleted in between these two consecutive calls. This has the advantage of avoid duplicate entries at the client side.
3. The return value additionally specifies if the whole directory is done listing. If the client sees a false flag, it will continue to issue another RPC.

This proposal will change the semantics of large directory listing in a sense that listing is no longer an atomic operation if a directory's content is changing while the listing operation is in progress.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

directoryBrowse_0.20yahoo_1.patch
16/Mar/10 19:05
5 kB
Hairong Kuang
directoryBrowse_0.20yahoo_2.patch
17/Mar/10 20:31
5 kB
Hairong Kuang
directoryBrowse_0.20yahoo.patch
12/Mar/10 01:10
4 kB
Hairong Kuang
iterativeLS_trunk.patch
10/Mar/10 21:09
40 kB
Hairong Kuang
iterativeLS_trunk1.patch
12/Mar/10 01:08
42 kB
Hairong Kuang
iterativeLS_trunk2.patch
12/Mar/10 01:42
43 kB
Hairong Kuang
iterativeLS_trunk3.patch
17/Mar/10 18:18
48 kB
Hairong Kuang
iterativeLS_trunk3.patch
17/Mar/10 00:32
48 kB
Hairong Kuang
iterativeLS_trunk4.patch
17/Mar/10 23:39
48 kB
Hairong Kuang
iterativeLS_yahoo.patch
25/Feb/10 20:29
36 kB
Hairong Kuang
iterativeLS_yahoo1.patch
26/Feb/10 01:19
40 kB
Hairong Kuang
testFileStatus.patch
01/Mar/10 18:49
1 kB
Hairong Kuang

Issue Links

relates to

HADOOP-6577 IPC server response buffer reset threshold should be configurable

Closed

Activity

People

Assignee:: Hairong Kuang

Reporter:: Hairong Kuang

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 18/Feb/10 01:37

Updated:: 24/Aug/10 20:51

Resolved:: 19/Mar/10 21:16