[HADOOP-18753] S3AFileSystem doesn't consistently handle prefixes that are both files and directories between versions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 3.3.4
Fix Version/s: None
Component/s: fs/s3
Labels:
None

Description

We have a prefix structure where the prefix Spark reads is both a file and a directory. So s3://a/b is the file we are trying to read, but s3://a/b/c is also a file. In 3.2.1, listStatuses identifies a/b as a File, but a change in 3.3.4 now identifies a/b as a directory and tries to read a/b/c instead of a/b.

When s3GetFileStatus is called on the path with StatusProbeEnum HEAD, the path does return as "File". However innerListStatus first assumes that any prefix that is "nonempty" is a directory; it only calls s3GetFileStatus on empty directories and on listObjects results of the prefix.

Wonder if this is known/if there are any suggestions to get around this without changing the prefix structure?

Attachments

Issue Links

relates to

HADOOP-17400 Optimize S3A for maximum performance in directory listings

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Helen Weng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/May/23 18:36

Updated:: 25/May/23 15:49

Resolved:: 25/May/23 14:23