Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-130

Delete the jobconf copy from the log directory of the JobTracker when the job is retired

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      When a job is initialized, it localizes the job conf to the logs dir. Without this patch I never gets deleted. Now when the job retires, the conf is deleted. This local copy is required to display on the webui.

      Description

      The JobTracker (for web-ui viewing purposes), copies the jobconf from the hdfs and store it in the log directory. The file should be deleted when the job is retired (removed from memory).

      1. HADOOP-5995-v1.0.patch
        3 kB
        Amar Kamat
      2. HADOOP-5995-v1.1.patch
        3 kB
        Amar Kamat
      3. MAPREDUCE-130-v1.0.patch
        3 kB
        Amar Kamat
      4. MAPREDUCE-130-v1.0-branch-0.20.patch
        3 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/ )
          Hide
          Amar Kamat added a comment -

          Upon restart the new jobtracker has no idea about the old completed jobs. I have opened MAPREDUCE-700 to address this. One solution is to have just one copy of jobconf with the jobtracker and delete it upon restart or job expiry. The ideal place to have it is the jobtracker subdir which gets cleaned up upon restart.

          Show
          Amar Kamat added a comment - Upon restart the new jobtracker has no idea about the old completed jobs. I have opened MAPREDUCE-700 to address this. One solution is to have just one copy of jobconf with the jobtracker and delete it upon restart or job expiry. The ideal place to have it is the jobtracker subdir which gets cleaned up upon restart.
          Hide
          Ramya Sunil added a comment -

          The above fix does not work in one special case. When a job completes and before it retires, if the JT is restarted, the conf files are not deleted from the the logdir. This will still lead to accumulation of conf files in the logdir of all those jobs which were completed but not yet retired when the JT was restarted. The issue is more pronounced when " retirejob interval" is very high

          Show
          Ramya Sunil added a comment - The above fix does not work in one special case. When a job completes and before it retires, if the JT is restarted, the conf files are not deleted from the the logdir. This will still lead to accumulation of conf files in the logdir of all those jobs which were completed but not yet retired when the JT was restarted. The issue is more pronounced when " retirejob interval" is very high
          Hide
          Sharad Agarwal added a comment -

          Also committed to 0.20 branch.

          Show
          Sharad Agarwal added a comment - Also committed to 0.20 branch.
          Hide
          Amar Kamat added a comment -

          Attaching a patch for branch 0.20

          Show
          Amar Kamat added a comment - Attaching a patch for branch 0.20
          Hide
          Sharad Agarwal added a comment -

          I committed this. Thanks Amar!

          Show
          Sharad Agarwal added a comment - I committed this. Thanks Amar!
          Hide
          Amar Kamat added a comment -

          Attaching a patch for mapreduce. This patch applies cleanly on my box.

          Show
          Amar Kamat added a comment - Attaching a patch for mapreduce. This patch applies cleanly on my box.
          Hide
          Amar Kamat added a comment -

          Opened HADOOP-6075 for TestTaskTrackerMemoryManager failure.

          Show
          Amar Kamat added a comment - Opened HADOOP-6075 for TestTaskTrackerMemoryManager failure.
          Hide
          Amar Kamat added a comment -

          Found corresponding jira's for the test failures.
          HADOOP-6042 for TestJobTrackerRestartWithLostTracker

          Show
          Amar Kamat added a comment - Found corresponding jira's for the test failures. HADOOP-6042 for TestJobTrackerRestartWithLostTracker
          Hide
          Amar Kamat added a comment -

          Ant tests passed except

          Name Type Result Link
          org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED Second time it passed ?
          org.apache.hadoop.mapred.TestKillSubProcesses FAILED Passed second time Failed on trunk too/HADOOP-6041
          org.apache.hadoop.mapred.TestReduceFetch FAILED Known issue HADOOP-6029
          org.apache.hadoop.mapred.TestTaskTrackerMemoryManager FAILED Passed second time ?

          test-contrib tests passed except

          Name Type Result Link
          org.apache.hadoop.streaming.TestStreamingExitStatus FAILED Known issue HADOOP-5906
          org.apache.hadoop.streaming.TestStreamingStderr FAILED (timeout) Known issue HADOOP-6062
          Show
          Amar Kamat added a comment - Ant tests passed except Name Type Result Link org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED Second time it passed ? org.apache.hadoop.mapred.TestKillSubProcesses FAILED Passed second time Failed on trunk too/ HADOOP-6041 org.apache.hadoop.mapred.TestReduceFetch FAILED Known issue HADOOP-6029 org.apache.hadoop.mapred.TestTaskTrackerMemoryManager FAILED Passed second time ? test-contrib tests passed except Name Type Result Link org.apache.hadoop.streaming.TestStreamingExitStatus FAILED Known issue HADOOP-5906 org.apache.hadoop.streaming.TestStreamingStderr FAILED (timeout) Known issue HADOOP-6062
          Hide
          Iyappan Srinivasan added a comment -

          All test scenarios passed.

          Test scenarios are described:

          Parameters are set as:

          mapred.jobtracker.retirejob.check=10
          mapred.jobtracker.retirejob.interval=10

          1) A job is launched and killed. Conf.xml is removed from
          jobtracker/logdir after around 10 seconds.

          2) A job is launched and completed. Conf.xml is removed from
          jobtracker/logdir after around 10 seconds.

          3) After a job is completed and after around 10 seconds, the .xml file will dissapear from the logs directory and the job details should not be found from the "completed" section of jobtracker.jsp front page. But it should be found in the job Hisory.

          4) After a job is killed and after around 10 seconds, the .xml file will dissapear from the logs directory and the job details should not be found from the "Failed" section of jobtracker.jsp front page. But it should be found in the job Hisory.

          5) After launching multiple jobs of multiple types ( like sleep , randomwriter), allow them to complete successfully. After around 10 seconds, the corresponding .xml files will dissapear from the logs directory and the job details should not be found from the "completed" section of jobtracker.jsp front page. But they should be found in the job History.

          6) After launching multiple jobs of multiple types ( like sleep , randomwriter), and kill them. After around 10 seconds, the corresponding .xml files will dissapear from the logs directory and the job details should not be found from the "Failed" section of jobtracker.jsp front page. But they should be found in the job History.

          7) When the job has completed successsfully as well as when it is killed, the Job History should be showing properly will all links, inlucing job conf link showing proper values.

          8) Start a sleep job allow the map to complete 50% and then kill the job tracker. Restart the job tracker and find the job continuing from where it left off. Check the logs directory. Ther should be only one .xml file. After the job is complete, it will remove the .xml file after 10 seconds and that job will be seen in job tracker history link and not in the front page. Make sure all links are proper including the job conf.

          Show
          Iyappan Srinivasan added a comment - All test scenarios passed. Test scenarios are described: Parameters are set as: mapred.jobtracker.retirejob.check=10 mapred.jobtracker.retirejob.interval=10 1) A job is launched and killed. Conf.xml is removed from jobtracker/logdir after around 10 seconds. 2) A job is launched and completed. Conf.xml is removed from jobtracker/logdir after around 10 seconds. 3) After a job is completed and after around 10 seconds, the .xml file will dissapear from the logs directory and the job details should not be found from the "completed" section of jobtracker.jsp front page. But it should be found in the job Hisory. 4) After a job is killed and after around 10 seconds, the .xml file will dissapear from the logs directory and the job details should not be found from the "Failed" section of jobtracker.jsp front page. But it should be found in the job Hisory. 5) After launching multiple jobs of multiple types ( like sleep , randomwriter), allow them to complete successfully. After around 10 seconds, the corresponding .xml files will dissapear from the logs directory and the job details should not be found from the "completed" section of jobtracker.jsp front page. But they should be found in the job History. 6) After launching multiple jobs of multiple types ( like sleep , randomwriter), and kill them. After around 10 seconds, the corresponding .xml files will dissapear from the logs directory and the job details should not be found from the "Failed" section of jobtracker.jsp front page. But they should be found in the job History. 7) When the job has completed successsfully as well as when it is killed, the Job History should be showing properly will all links, inlucing job conf link showing proper values. 8) Start a sleep job allow the map to complete 50% and then kill the job tracker. Restart the job tracker and find the job continuing from where it left off. Check the logs directory. Ther should be only one .xml file. After the job is complete, it will remove the .xml file after 10 seconds and that job will be seen in job tracker history link and not in the front page. Make sure all links are proper including the job conf.
          Hide
          Amareshwari Sriramadasu added a comment -

          Changes look fine to me.

          Show
          Amareshwari Sriramadasu added a comment - Changes look fine to me.
          Hide
          Amar Kamat added a comment -

          Attaching a patch incorporating Amareshwari's comments.
          Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant tests now

          Show
          Amar Kamat added a comment - Attaching a patch incorporating Amareshwari's comments. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant tests now
          Hide
          Amareshwari Sriramadasu added a comment -

          Minor comments:
          1. change the comment from cleanup history to cleanup local conf
          2. Move the test from testJobHistoryUserLogLocation to testJobHistoryFile

          Show
          Amareshwari Sriramadasu added a comment - Minor comments: 1. change the comment from cleanup history to cleanup local conf 2. Move the test from testJobHistoryUserLogLocation to testJobHistoryFile
          Hide
          Amar Kamat added a comment -

          Result of test-patch
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          The findbugs warning is to do with return value of file.delete(). I dont think its important. At the max we can log it. Running ant tests now.

          Show
          Amar Kamat added a comment - Result of test-patch [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. The findbugs warning is to do with return value of file.delete(). I dont think its important. At the max we can log it. Running ant tests now.
          Hide
          Amar Kamat added a comment -

          Attaching a simple fix. Running test-patch now.

          Show
          Amar Kamat added a comment - Attaching a simple fix. Running test-patch now.

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Devaraj Das
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development