Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features
  3. HADOOP-17318

S3A committer to support concurrent jobs with same app attempt ID & dest dir

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.1
    • Component/s: fs/s3
    • Target Version/s:

      Description

      Reported failure of magic committer block uploads as pending upload ID is unknown. Likely cause: it's been aborted by another job

      1. Make it possible to turn off cleanup of pending uploads in magic committer
      2. log more about uploads being deleted in committers
      3. and upload ID in the S3aBlockOutputStream errors

      There are other concurrency issues when you look close, see SPARK-33230

      • magic committer uses app attempt ID as path under __magic; if there are duplicate then they will conflict
      • staging committer local temp dir uses app attempt id

      Fix will be to have a job UUID which for spark will be picked up from the SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ older spark builds); fall back to app-attempt unless that fallback has been disabled

      MR: configure to use app attempt ID
      Spark: configure to fail job setup if app attempt ID is the source of a job uuid

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              stevel@apache.org Steve Loughran

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 8h 10m
                8h 10m

                  Issue deployment