Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      Object stores do not have an efficient rename operation, which is used by the Hadoop FileOutputCommitter to atomically promote the "winning" attempt out of the multiple (speculative) attempts to the final path. These slow job commits are one of the main friction points when using object stores in Hadoop.There have been quite some attempts at resolving this: HADOOP-9565, Apache Spark DirectOutputCommitters, ... but they have proven not to be robust in face of adversity (network partitions, ...).

      The current ticket proposes to do the atomic commit by using the S3 Multipart API, which allows multiple concurrent uploads on the same objectname, each in its own "temporary space, identified by the UploadId which is returned as a response to InitiateMultipartUpload. Every attempt writes directly to the final outputPath. Data is uploaded using Put Part and as a response an ETag for the part is returned and stored. The CompleteMultipartUpload is postponed. Instead, we persist the UploadId (using a _temporary subdir or elsewhere) and the ETags. When a certain "job" wins CompleteMultipartUpload is called for each of its files using the proper list of Part ETags.

      Completing a MultipartUpload is a metadata only operation (internally in S3) and is thus orders of magnitude faster than the rename-based approach which moves all the data.

      Required work:

      • Expose the multipart initiate and complete calls in S3AOutputStream to S3AFilesystem
      • Use these multipart calls in a custom committer as described above. I propose to build on the S3ACommitter Steve Loughran is doing for HADOOP-13786

        Issue Links

          Activity

          Show
          Thomas Demoor Thomas Demoor added a comment - Extensive write-up by Steve Loughran based on the conf call we ahd this week where I explained the idea: https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delayed-put-commit.md
          Hide
          stevel@apache.org Steve Loughran added a comment -

          This is big and compex enough that it may be worthwhile pulling out into its own toplevel JIRA; this will make it easier to divide up bits of work

          Show
          stevel@apache.org Steve Loughran added a comment - This is big and compex enough that it may be worthwhile pulling out into its own toplevel JIRA; this will make it easier to divide up bits of work
          Hide
          stevel@apache.org Steve Loughran added a comment -

          We need a name for this? S3guard committer? I know it's being designed to work directly with a consistent S3 instance, but it all comes together. Calling it the s3guard one will discourage people trying to use it without s3guard turned on

          Show
          stevel@apache.org Steve Loughran added a comment - We need a name for this? S3guard committer? I know it's being designed to work directly with a consistent S3 instance, but it all comes together. Calling it the s3guard one will discourage people trying to use it without s3guard turned on
          Hide
          stevel@apache.org Steve Loughran added a comment -

          closing as duplicate of HADOOP-1786; adding subjiras there

          Show
          stevel@apache.org Steve Loughran added a comment - closing as duplicate of HADOOP-1786 ; adding subjiras there
          Hide
          cguan chao Guan added a comment -

          Do you mean HADOOP-13786?

          Show
          cguan chao Guan added a comment - Do you mean HADOOP-13786 ?

            People

            • Assignee:
              Thomas Demoor Thomas Demoor
              Reporter:
              Thomas Demoor Thomas Demoor
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development