[HDFS-14663] HttpFS: LISTSTATUS_BATCH does not return batches - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: httpfs
Labels:
None

Target Version/s:

3.5.0

Description

The webhdfs protocol supports a LISTSTATUS_BATCH operation where it can retrieve the file listing for a large directory in chunks.

When using the webhdfs service embedded in the namenode, this works as expected, but when using HTTPFS, any call to LISTSTATUS_BATCH simply returns the entire listing rather than batches, working effectively like LISTSTATUS instead.

This seems to be because HTTPFS falls back to using the method org.apache.hadoop.fs.FileSystem#listStatusBatch, which is intended to be overridden, but the implementation used in HTTPFS has not done that, leading to this limitation.

This feature (LISTSTATUS_BATCH) was added to HTTPFS by ~~HDFS-10823~~, but based on my testing it does not work as intended. I suspect it is because the listStatusBatch operation was added to the WebHdfsFileSystem and HttpFSFileSystem as part of the above Jira, but behind the scenes HTTPFS seems to use DistributeFileSystem and hence it falls back to the default implementation "org.apache.hadoop.fs.FileSystem#listStatusBatch" which returns all entries in a single batch.

Attachments

Issue Links

is caused by

HDFS-10823 Implement HttpFSFileSystem#listStatusIterator

Resolved

Activity

People

Assignee:: Siyao Meng

Reporter:: Stephen O'Donnell

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 23/Jul/19 16:22

Updated:: 04/Jan/24 08:12