Wow lots of comments. Thanks for everyone looking at the patch.
I had observed that if I made my AM crash (by putting an exit(1) in shutdownJob() then the history files would get orphaned and not cleaned up. Or something like that.
Thanks for the heads up. I will look into that.
Why not end in success if the staging dir was cleaned up by the last attempt?
Because we crashed somewhere after staging was cleaned up and before we unregistered. Crashing seems like an error to me, but I suppose we could change it. As for what the client ultimately sees for success or failure, we will rely on the history server to report that.
I am guessing that this code wont be necessary after we move the unregister to RM before the staging dir cleanup in
Yes and No. Once
MAPREDUCE-4841 goes in there is an increased possibility of leaking staging directories. I have seen users in 1.0 blow away their staging directory to clean up, and caused jobs to fail. Granted they are more likely to get errors from the distributed cache not finding the files it needs, but in either case I would like to be paranoid and guard against that.
Why are we only eating/ignoring the JobEvents in the dispatcher? So that the JobImpl state machine is not triggered?
In the new code path we have not wired up everything. JobImpl is created but the JobEventDispatcher is not. I did not want to have to deal with recovering the complete state of the job. Which in some cases may not even be possible. This is also why I am not brining up the RPC server. Which now that you mention it I probably also need to update the UI/client to deal with that appropriately. The typo you found was just there for debugging this situation. (I'll fix the typo by the way)
This might be a question of personal preference. I think an explicit transition to from the INIT to final state is cleaner than overriding the state in the getter.
I actually wanted to put in a stubbed out Job instead, but there are too many places that Job is cast to JobImpl just to get the state making it difficult to do so. I will look again to see if I can split the two apart, or add in a state transition.
Oozie handles duplicate notifications correctly doing a NOP.
Great. I will look at the javadocs for job end notification again to be sure that we can default to notify instead.
Using separate files for marking success / failure - am guessing this is to have a smaller change of a failing persist, as compared to persisting events via the HistoryFile, which may already have a backlog of events?
It was also a much smaller change to make. The HistoryFile would be preferable if we wanted to guarantee at most once commit of the tasks, because there are so many of them.
Wondering if it's possible to achieve the same checks via the CommitterEventHandler instead of checking in the MRAppMaster class. i.e follow the regular recovery path - except the CommitHandler emits success / failed / abort events depending on the presence of these files / (history events).
Alternately, the current implementation could be simplified by using a custom RMCommunicator - which does not depend on JobImpl. i.e. the history copier and an RMCommunicator to unregister from the RM.
Both of those seem like valid things to investigate. I feel like I am close on this and want to get this working as is first and then I will look at the other approaches you suggested. I do like the first one as it seems like it would be a lot simpler to implement, but I want a backup that I know functions before making drastic changes to the design.
If the last AM attempt were to crash - data exists since the SUCCESS file exists, RPC will not see SUCCESS.
We have a lot of problems in general if the last AM were to crash. It is possible that the history server would have no knowledge of the job what so ever even if it finished successfully. This patch is not attempting to address those problems.
While the new AM is running - it will not be able to handle status, counter etc requests. This seems a little problematic if a success has been reported over RPC from the previous AM. Since this AM is dealing with the history file - could possibly have it return information from the history file ? History commit before SUCCESS may help with the previous 2 points.
Yes History commit before returning success would help with those problems. I will look into it as an alternative approach. my initial thought was to update the client/UI to wait for the AM to report a valid address so that no client is trying to get counters etc from an AM in this situation.
If the recovered AppMaster is not the last retry - looks like the RM unregistration will not happen. (isLastAMRetry)
isLastAMRetry is set in a number of places, including in the init method if we notice that the previous Job ended but the AM crashed.
Is a KILLED status also required - KILLED during commit should not be reported as FAILED.
That would be nice. We would have to put it in as part of CommiterEventHandler.cancelJobCommit(). I will look into that.
CommitEventHandler.touchz could throw an exception if the file already exists - to prevent lost AMs from committing. (maybe not required after
I think it already will. We are not opening the file for append, we are trying to create it.
historyService creation - can move into the common if (copyHistory) check
Don't think "AMStartedEvent" cannot be ignored - the history server will have no info about past AMs. I think only the current AM needs to be ignored.
The AMStartedEvent is ignored by the copy service but not by the MRAppMaster. The MRAppMaster will read the history file just like it did before and extract the AMStartedEvents, it will add in another one for itself, and then the copyHistoryService will read the rest of the history file.
Wondering if it's possible to use HDFS dirs and timestamps to co-ordinate between an active AM and lost AMs.
Also, are hdfs dir operations cheaper than file create operations (NN only / NN +DN) ? Nor sure if mkdir / 0 length file creation are NN only ops.
I thought that they were NN only ops, but I will check with an HDFS person to know for sure.