[HDFS-14233] Implement DistributedFileSystem#listStatus(Path[]) by adding a batching listStatus RPC call to NameNode - ASF JIRA

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: namenode
Labels:
None

Description

~~HDFS-985~~ fixed an important problem for HDFS where listing a huge directory takes too long and needs to be paged.

This Jira proposed to solve the problem of the other extreme: Make it more efficient to list thousands of directories that contains very few files in each.

Note that this is already supported in the FileSystem API, but implemented using a loop:

public FileStatus[] listStatus(Path[] files, PathFilter filter) throws FileNotFoundException, IOException {
ArrayList<FileStatus> results = new ArrayList();
for(int i = 0; i < files.length; ++i) {
this.listStatus(results, files[i], filter);
{{ }}}
return (FileStatus[])results.toArray(new FileStatus[results.size()]);
{{ }}}

HIVE-9736 is a real use case in Hive where we need to list the files in many directories (or partitions in hive concept). That Jira was proposed assuming that listStatus(Path[]) is more efficient in HDFS, but actually not.

There are 2 ways to achieve this:

A. Issue parallel RPCs to NameNode to speed things up;
B. Issue a single RPC to NameNode with all the paths;

While A is a simpler and safe change, it adds a lot more load into the NameNode since the overhead of an RPC (and the locking) for the actual listing of a small directory is huge.

We propose to do B as above instead. It can take advantage of the paging mechanism from ~~HDFS-985~~ to ensure we don't overload the NameNode. This will greatly speed up the Hive use case above while also reducing the load on NameNode for such a use case, since NameNode can process many directory listing by paying the overhead of a single RPC and a single Global Shared Lock acquisition.