Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18753

S3AFileSystem doesn't consistently handle prefixes that are both files and directories between versions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 3.3.4
    • None
    • fs/s3
    • None

    Description

      We have a prefix structure where the prefix Spark reads is both a file and a directory. So s3://a/b is the file we are trying to read, but s3://a/b/c is also a file. In 3.2.1, listStatuses identifies a/b as a File, but a change in 3.3.4 now identifies a/b as a directory and tries to read a/b/c instead of a/b.

      When s3GetFileStatus is called on the path with StatusProbeEnum HEAD, the path does return as "File". However innerListStatus first assumes that any prefix that is "nonempty" is a directory; it only calls s3GetFileStatus on empty directories and on listObjects results of the prefix.

      Wonder if this is known/if there are any suggestions to get around this without changing the prefix structure?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              helenaut Helen Weng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: