Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13204 Über-jira: S3a phase III: scale and tuning
  3. HADOOP-13654

S3A create() to support asynchronous check of dest & parent paths

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.7.3
    • None
    • fs/s3
    • None

    Description

      One source of delays in S3A is the need to check if a destination path exists in create; this makes sure the operation isn't trying to overwrite a directory.

      #. This is slow, 1-4 HTTPS requests

      1. The code doesn't seem to check the entire parent path to make sure there isn't a file as a parent (which raises the question: shouldn't we have a contract test for this?)
      2. Even with the create overwrite=false check, the fact that the new object isn't created until the output stream is close()'d, means that the check has race conditions.

      Instead of doing a synchronous check in create(), we could do an asynchronous check of the parent directory tree. If any error surfaced, this could be cached and then thrown on the next call to: write(), flush() or close(); that is, the failure of a create due to path problems would not surface immediately on the create() call, but before any writes were committed.

      The full directory tree can/should be checked, and is results remembered. This would allow for the post-commit cleanup to issue delete() requests purely for those paths (if any) which referred to directories.

      As well as the need to use the AWS thread pool, there's a bit of complexity with cancelling multipart uploads: the output stream needs to know that the request failed, and that the multipart should be aborted.

      If the complexity of the asynchronous calls can be coped with, and client code happy to accept errors in the any IO call to the output stream, then the initial overhead at file creation could be skipped.

      Attachments

        Activity

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: