Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7282

MR v2 commit algorithm should be deprecated and not the default

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 3.3.0, 3.2.1, 3.1.3, 3.3.1
    • None
    • mrv2

    Description

      The v2 MR commit algorithm moves files from the task attempt dir into the dest dir on task commit -one by one

      It is therefore not atomic

      1. if a task commit fails partway through and another task attempt commits -unless exactly the same filenames are used, output of the first attempt may be included in the final result
      2. if a worker partitions partway through task commit, and then continues after another attempt has committed, it may partially overwrite the output -even when the filenames are the same

      Both MR and spark assume that task commits are atomic. Either they need to consider that this is not the case, we add a way to probe for a committer supporting atomic task commit, and the engines both add handling for task commit failures (probably fail job)

      Better: we remove this as the default, maybe also warn when it is being used

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            21 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Slack

                  Issue deployment