Thanks for the patch, Chang!
Note that the point of this change is to be able to have users locate any potential logs for applications that failed in the ASSIGNED state. By having a canned fake started event there's no way to determine which nodemanager tried to run the container and therefore we can't provide a good logs link. We need to preserve as much information as we can about the task, and that includes the host, http port, etc.
The good news is that we have most of this information from the container that was assigned to the task attempt. See the code for LaunchedContainerTransition for details. It would be nice to see some of the code in that transition factored out so it can be reused when we are creating the start event for an attempt that failed in the ASSIGNED state. Also I would hesitate to call it a fake event. It's still a task started event, but we are missing just a few key components like the shuffle port and the start time. If we factor out the code from LaunchedContainerTransition then we can drop the "fake" part.
Is forceFinishTime really necessary? We can go ahead and set the launch time as we are processing the task started event and then just call setFinishTime.
In general I think we should worry about making sure we generate a proper task start event and then let the normal task unsuccessful completion event code handle things after that. For example, in DeallocateContainerTransition I think we should be generating the job counter update events for this scenario, but we don't since we go down a different task unsuccessful completion event handling path when launchTime is zero. Seems like we should just generate the missing start event when launchTime is zero then fall through to the normal unsucessful completion event handling code in all cases after that.
Nit: missing whitespace before new method in MRApp.