[HADOOP-13704] S3A getContentSummary() to move to listFiles(recursive) to count children; instrument use - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 3.3.5
Component/s: fs/s3
Labels:
- pull-request-available

Target Version/s:

3.4.0

Description

Hive and a bit of Spark use getContentSummary() to get some summary stats of a filesystem. This is very expensive on S3A (and any other object store), especially as the base implementation does the recursive tree walk.

Because of ~~HADOOP-13208~~, we have a full enumeration of files under a path without directory costs...S3A can/should switch to this to speed up those places where the operation is called.

Also

API call needs FS spec and contract tests
S3A could instrument invocation, so as to enable real-world popularity to be measured

Attachments

Issue Links

is duplicated by

HADOOP-16468 S3AFileSystem.getContentSummary() to use listFiles(recursive)

Resolved

HADOOP-13829 S3A getContentSummary to use flat listFiles instead of treewalk

Resolved

links to

GitHub Pull Request #3978

Activity

People

Assignee:: Ahmar Suhail

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 10/Oct/16 16:43

Updated:: 22/Mar/22 13:49

Resolved:: 22/Mar/22 13:49

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: