Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13278

S3AFileSystem mkdirs does not need to validate parent path components

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • fs/s3, tools
    • None

    Description

      According to S3 semantics, there is no conflict if a bucket contains a key named a/b and also a directory named a/b/c. "Directories" in S3 are, after all, nothing but prefixes.

      However, the mkdirs call in S3AFileSystem does go out of its way to traverse every parent path component for the directory it's trying to create, making sure there's no file with that name. This is suboptimal for three main reasons:

      • Wasted API calls, since the client is getting metadata for each path component
      • This can cause major problems with buckets whose permissions are being managed by IAM, where access may not be granted to the root bucket, but only to some prefix. When you call mkdirs, even on a prefix that you have access to, the traversal up the path will cause you to eventually hit the root bucket, which will fail with a 403 - even though the directory creation call would have succeeded.
      • Some people might actually have a file that matches some other file's prefix... I can't see why they would want to do that, but it's not against S3's rules.

      I've opened a pull request with a simple patch that just removes this portion of the check. I have tested it with my team's instance of Spark + Luigi, and can confirm it works, and resolves the aforementioned permissions issue for a bucket on which we only had prefix access.

      This is my first ticket/pull request against Hadoop, so let me know if I'm not following some convention properly

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            apetresc Adrian Petrescu

            Dates

              Created:
              Updated:

              Slack

                Issue deployment