Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
3.3.4
-
None
-
None
Description
We have a prefix structure where the prefix Spark reads is both a file and a directory. So s3://a/b is the file we are trying to read, but s3://a/b/c is also a file. In 3.2.1, listStatuses identifies a/b as a File, but a change in 3.3.4 now identifies a/b as a directory and tries to read a/b/c instead of a/b.
When s3GetFileStatus is called on the path with StatusProbeEnum HEAD, the path does return as "File". However innerListStatus first assumes that any prefix that is "nonempty" is a directory; it only calls s3GetFileStatus on empty directories and on listObjects results of the prefix.
Wonder if this is known/if there are any suggestions to get around this without changing the prefix structure?
Attachments
Issue Links
- relates to
-
HADOOP-17400 Optimize S3A for maximum performance in directory listings
- Resolved