As part of a good solution (for 0.14 or later), I think we should separate out reporting of progress by the sort/merge/user code and reporting progress from the Task to the Task Tracker.
For the former, we make the Reporter object available to the MapReduce kernel code, as Devaraj suggested, and at other appropriate places as discussed in this conversation. Wherever progress is made that we need to report (during sort or merge or whatever), the kernel code or the user's code calls the Reporter project.
Separately, for the latter, we probably should continue with the Progress thread. This thread looks at the Progress data structures and sends progress info to the TaskTracker via RPC. To avoid the problem that this bug was filed for, we have two likely options:
1. The thread continuus doing what it is doing is: it sends the progress information at regular intervals and the TaskTracker decides whether the task has really made progress, based on what it got earlier. Or
2. The thread decides whether progress has really been made and makes an RPC call only if necessary. Even if progress is not made, it may make a call if we eliminate the Ping thread (see issue 1201) to prevent the TaskTracker from killing the task.
The latter's probably a better option as the logic to decide whether progress has been made may be easier to implement in the thread, rather than in TaskTracker. As discussed earlier in this conversation, we may resume/suspend the thread, or at least make sure we start and stop it at the right places But I'd suggest we separate the issue of reporting progress locally (via the Reporter object) with reporting progress to the TaskTracker (via a thread). The logic for the two issues is diferent and separating the code will make things cleaner and easier to change.