Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1201

Progress reporting can be improved for both Map/Reduce tasks

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Both the map and reduce tasks do progress reporting in separate threads. However, in the ReduceTask, after the sort phase, the progress reporting happens inline with the reducer invocations. This slows down the Reduce phase since RPC is involved for every progress report. The better thing to do would be to do progress reporting for all phases in separate threads and have the tasks just update the progress fields.
      One proposal is to extract out the reporting stuff that is there in MapTask/ReduceTask and put it in the Task superclass as a new class, and have methods in the new class that control what/when progress is reported. Thoughts?

      Attachments

        Issue Links

          Activity

            People

              vivekr Vivek Ratan
              ddas Devaraj Das
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: