Thanks for the comments Tan, Wangda,
When app goes to final state (FINISHED/KILLEd, etc.), should we simply set AMLaunchDiagnostics to null?
IIUC you are referring to RMAppAttemptImpl right ?, if so its mistake while correcting based on your previous comment missed to revert this part but anyway as per your 4th comment in cases of unmanaged AM i have updated it to null here.
Why need two separate methods: updateDiagnosticsIfNotRunning/updateDiagnostics?
May be the name needs to be proper but two methods are required as the status needs to be updated only if AM is not running for example its called in FiCaSchedulerApp.allocate, this method will be called whenever container is assiged for a app but we want to update the diagnostic only when the AM is not yet launched. and similarly used in LeafQueue.assignContainers. But in some cases we are sure that the AM is not yet launched hence to avoid unwanted verification (whether AM is running) we have updateDiagnostics. May be i can name them as checkAndUpdateAMContainerDiagnostics and updateAMContainerDiagnostics ?
Do you think is it better to rename AMState.PENDING to inactivated?
Yes, PENDING is not understandable to all hence the diagnostic message for PENDING is already set as "Application is added to the scheduler and is not yet activated." may be i can mention it as Application is added to the scheduler but is not yet scheduled. Thoughts?
Instead of setting AMLaunchDiagnostics to null when RMAppAttempt enters Scheduled state,do you think is it better to do that in RUNNING and FINAL_SAVING state? Unmanaged AM could skip the SCHEDULED state.
IMO i would prefer to set only for Unmanaged AMs in FINAL_SAVING state as already we are showing the YarnApplicationState as running and giving description abt it. so again if diagnostics is also showing that AM is launched and running then it can becomes repetitive in UI for normal (non unmanaged AM) apps.
It will be also very usaful if you can update AM launch diagnostics when RMAppAttempt go to LAUNCHED state,
Actually i wrongly considered AMContainerAllocatedTransition to reset the diag message, my intention was to reset only after its launched and registered. This would be very usefull for analyzing the state of AM. Have introduced LAUNCHED and setting after AMLauncher sends LAUNCHED event to RMAppAttempt.
Tan, Wangda & Jian He
Please review the latest patch,