Issue Details (XML | Word | Printable)

Key: HADOOP-4372
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Amar Kamat
Reporter: Amar Kamat
Votes: 1
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Improve the way the job history files are managed during job recovery

Created: 08/Oct/08 08:27 AM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.21.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-4372-v1.4.patch 2009-04-15 12:44 PM Amar Kamat 17 kB
Text File Licensed for inclusion in ASF works HADOOP-4372-v1.patch 2008-10-24 08:54 AM Amar Kamat 15 kB
Text File Licensed for inclusion in ASF works HADOOP-4372-v3.0.patch 2009-05-04 12:31 PM Amar Kamat 3 kB
Text File Licensed for inclusion in ASF works HADOOP-4372-v3.1.patch 2009-05-07 09:18 AM Amar Kamat 7 kB

Hadoop Flags: Reviewed
Resolution Date: 08/May/09 08:33 AM


 Description  « Hide
Today we use the .recover technique to handle the job history files when the jobtracker restarts. The comment here proposes a better way to handle the files.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Amar Kamat added a comment - 24/Oct/08 08:54 AM
Attaching a patch the implements the basic idea. This patch uses a new filename on recovery. Updated the testcase accordingly.

Amar Kamat added a comment - 15/Apr/09 12:44 PM
Result of test-patch
[exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.

Ant tests passed on my box.


Devaraj Das added a comment - 17/Apr/09 11:18 AM
Could we instead pass a boolean to logSubmitted depending on the restart-count value (0 == new job, and the boolean is true), and then create the recovery file inside the jobHistory if the boolean is false.
The problem with the current patch is that the filename of the history file changes upon restart..

Amar Kamat added a comment - 04/May/09 12:31 PM
Attaching a patch that simply uses the boolean passed whether the job is old or new. Testing in progress.

Amar Kamat added a comment - 04/May/09 01:26 PM
Changes to src/mapred/mapred-default.conf are not required. Plz ignore that.

Amar Kamat added a comment - 07/May/09 09:18 AM
Attaching a patch that optimizes jobhistory for new jobs. Result of test-patch
[exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.

Ant test passed on my box


Ravi Gummadi added a comment - 08/May/09 06:24 AM
Patch looks good.
+1

Devaraj Das added a comment - 08/May/09 08:33 AM
I just committed this. Thanks, Amar!

Hudson added a comment - 08/May/09 07:55 PM
Integrated in Hadoop-trunk #830 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/830/)
. Improves the way history filenames are obtained and manipulated. Contributed by Amar Kamat.

Devaraj Das added a comment - 21/May/09 09:02 AM
I committed this to 0.20 branch as well. There have been problems like job submission taking a long time when the number of files in the history folder are too large. This patch introduced an API that would fasten the process of obtaining a history file for a new job (earlier it used to do a scan of the history folder which is costly when the number of files are large).

Hudson added a comment - 11/Jun/09 07:59 PM