Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15620 Über-jira: S3A phase VI: Hadoop 3.3 features
  3. HADOOP-16632

Speculating & Partitioned S3A magic committers can leave pending files under __magic

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.2.1, 3.1.3
    • 3.3.0
    • fs/s3
    • None

    Description

      Partitioned S3A magic committers can leaving pending files, maybe upload data

      This surfaced in an assertion failure on a parallel test run.

      I thought it was actually a test failure, but with HADOOP-16207 all the docs are preserved in the local FS and I can understand what happened.

      Junit process

      [INFO] 
      [ERROR] Failures: 
      [ERROR] ITestS3ACommitterMRJob.test_200_execute:344->customPostExecutionValidation:356 Expected a java.io.FileNotFoundException to be thrown, but got the result: : "Found magic dir which should have been deleted at S3AFileStatus{path=s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic; isDirectory=true; modification_time=0; access_time=0; owner=stevel; group=stevel; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=UNKNOWN eTag=null versionId=null
      [s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks/attempt_1570197469968_0003_m_000008_1/__base/part-m-00008
      s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks/attempt_1570197469968_0003_m_000008_1/__base/part-m-00008.pending
      

      Full details to follow in the comment as they are, well, detailed.

       

      Key point: AM-side job and task cleanup can happen before the worker task finishes its writes. This will result in files under __magic. It may result in pending uploads too -but only if the write began after the AM job cleanup did a list + abort of all pending uploads under the destination directory

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stevel@apache.org Steve Loughran Assign to me
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment