Attaching a patch that implements JT restart using JobHistory.
Currently the job history filename is of the following format history-timestamp_jt-hostname_jobid_username_jobname. It was introduced in
HADOOP-239 and the timestamp was added in the beginning since the job names were not unique. It makes it difficult to guess the job history filename with history-timestamp. So history-timestamp is removed as currently job-id is unique across restarts.
So for now we define
master-file = jt-hostname_jobid_username_jobname.
tmp-file = master-file.tmp
0) Upon restart the JT goes in safe mode. In safe mode all the trackers are asked to resend/replay their heartbeat.
1) For a new job, the history file is the master-file. For a restarted job, the history is written to the tmp file.
2) Following checks are made for a recovered job
2.1) If the master file exists then delete the tmp file
2.2) If the master file is missing then make the tmp file as master
3) Upon restart the master-file is read and default-history-parser is used to parse and recover history records. These records are used to create taskStatus which is replayed in order. Before replaying the JT waits for the jobs to be inited.
4) Once the replay is over, delete the master-file to indicate that the tmp file is more recent. Note that on next restart the tmp file will be used for recovery.
5) Once all the jobs are recovered, turn off the safe mode. JT will now process heartbeats (called as successful re-connect). Also the registration window timer starts. JT waits for tracker-expiry-interval time after last-tracker-re-connect before closing the window. Once the window closes, JT is considered as recovered. This plays an important role in detecting the trackers that went down while the JT was down. Upon recovery, JT re-executes all the tasks that were on the lost trackers.
6) Since the history can have some data missing, there can be a case where the map-completion-event-list at the JT is smaller than the one at the tracker. Hence there is a rollback required upon restart. Once the JT is out of safe mode, it passes this information (map-events-list-size) to the tracker on the successful reconnect.
7) The tasktracker rollbacks few events and asks the child tasks to reset their index to 0. Child tasks fetches all the events back and filters out necessary events for further processing. This is similar to the one discussed in approach #1.
8) Errors in history can cause the parser to fail. We have
HADOOP-2403 to address this. For now this patch encodes errors. This will replaced with the fix in HADOOP-2403.
9) Currently counters are stringified and written to history. It is not possible to recover the counter back from the string and hence this patch encodes the counter-names so that they can be easily recovered. Note that there is no encoding in the user space. Only the frameworks history file has codes.
10) Once the job finishes the tmp file is renamed to master-file. Similarly the history files in the user directory also follow the same renaming cycle.
11) Job priority is logged on every change and hence its recovered.
1) This approach/patch works fine with history on local fs. With history on HDFS, the history file becomes visible but not available (i.e file-size = 0). The file becomes available only on close(). Sync() documentation indicates that the file-data availability is not guaranteed.
2) Detecting job runtime is still an issue.
We are working on it.
1) Refactor common code.
2) Remove extra logs
3) For ease of testing JT killing facility is added to web-ui. There is some extra code to support this. Clear it out.
4) To test the usage of sync(), there are periodic syncs done to the history files. This is just for testing.
5) Optimize encoding/decoding.
6) Group together all the recovery code under something like JobTrackerRecoveryManager.
Note that the logs/debugging-code/testing-code is still a part of this patch as I am testing it.