Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1127

Speculative Execution and output of Reduce tasks

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.12.0
    • 0.13.0
    • None
    • None

    Description

      We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with AlreadyBeingCreatedException noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.

      As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.

      Proposal:

      Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position.

      We create the temp file in the job's output directory itself:
      <output_dir>/<taskid> (emphasis on the leading '')

      On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.

      Thoughts?

      Attachments

        1. HADOOP-1127_20070328_1.patch
          11 kB
          Arun Murthy
        2. HADOOP-1127_20070331_2.patch
          11 kB
          Arun Murthy
        3. HADOOP-1127_20070402_3.patch
          13 kB
          Arun Murthy
        4. HADOOP-1127_20070403_4.patch
          14 kB
          Arun Murthy
        5. HADOOP-1127_20070405_5.patch
          14 kB
          Arun Murthy
        6. HADOOP-1127_20070409_6.patch
          15 kB
          Arun Murthy
        7. HADOOP-1127_20070419_7.patch
          19 kB
          Arun Murthy
        8. HADOOP-1127_20070420_8.patch
          19 kB
          Arun Murthy
        9. HADOOP-1127_20070423_9.patch
          22 kB
          Arun Murthy
        10. HADOOP-1127_20070424_10.patch
          23 kB
          Arun Murthy

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            acmurthy Arun Murthy
            acmurthy Arun Murthy
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment