Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1920

Job.getCounters() returns null when using a cluster

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Calling Job.getCounters() after the job has completed (successfully) returns null.

      1. MAPREDUCE-1920.patch
        3 kB
        Tom White
      2. MAPREDUCE-1920.patch
        16 kB
        Tom White
      3. MAPREDUCE-1920.patch
        10 kB
        Tom White
      4. MAPREDUCE-1920.patch
        2 kB
        Tom White

        Issue Links

          Activity

          Hide
          Aaron Kimball added a comment -

          The new API seems to have an issue w.r.t. counters. Calling Job.getCounters() after the job has completed (successfully) returns null. I can see all the counters there on the JobTracker status web page. They have the correct values. But I can't access them programmatically.

          So, this is returning null:

          public class Job extends JobContextImpl implements JobContext {
          
           ...
          
            public Counters getCounters()
                throws IOException, InterruptedException {
              ensureState(JobState.RUNNING);
              return cluster.getClient().getJobCounters(getJobID());
            }
          
          }
          

          This seems to work fine with the LocalJobRunner.

          Show
          Aaron Kimball added a comment - The new API seems to have an issue w.r.t. counters. Calling Job.getCounters() after the job has completed (successfully) returns null. I can see all the counters there on the JobTracker status web page. They have the correct values. But I can't access them programmatically. So, this is returning null: public class Job extends JobContextImpl implements JobContext { ... public Counters getCounters() throws IOException, InterruptedException { ensureState(JobState.RUNNING); return cluster.getClient().getJobCounters(getJobID()); } } This seems to work fine with the LocalJobRunner.
          Hide
          Amareshwari Sriramadasu added a comment -

          Are you sure that the job is not retired? I strongly feel this should not break, because there are many unit tests calling this api. For example, TestMiniMRDFSSort calls this api and runs successfully on branch 0.21.

          Show
          Amareshwari Sriramadasu added a comment - Are you sure that the job is not retired? I strongly feel this should not break, because there are many unit tests calling this api. For example, TestMiniMRDFSSort calls this api and runs successfully on branch 0.21.
          Hide
          Aaron Kimball added a comment -

          I agree that this shouldn't break And yet, I configured MapReduce as a straight-up pseudo-distributed instance. I didn't set anything other than mapred.job.tracker and fs.default.name in the conf files.

          My application calls job.getCounters() immediately upon return from job.waitForCompletion(). It may be possible that jobs are retiring instantaneously / "very quickly" in a manner that is racing with my application? Is there a guaranteed window of time for which a job won't be retired?

          I feel like there should be a guaranteed minimum; maybe this is in time, maybe as long as the original reference to a Job object on the client is live? (Easier said than done in the latter case – maybe the Job could be configured in such a way as to reserve the right to retrieve its Counters or other post-execution data at least once after waitForCompletion() returns?)

          Show
          Aaron Kimball added a comment - I agree that this shouldn't break And yet, I configured MapReduce as a straight-up pseudo-distributed instance. I didn't set anything other than mapred.job.tracker and fs.default.name in the conf files. My application calls job.getCounters() immediately upon return from job.waitForCompletion(). It may be possible that jobs are retiring instantaneously / "very quickly" in a manner that is racing with my application? Is there a guaranteed window of time for which a job won't be retired? I feel like there should be a guaranteed minimum; maybe this is in time, maybe as long as the original reference to a Job object on the client is live? (Easier said than done in the latter case – maybe the Job could be configured in such a way as to reserve the right to retrieve its Counters or other post-execution data at least once after waitForCompletion() returns?)
          Hide
          Amareshwari Sriramadasu added a comment -

          Can you bring up your cluster with mapreduce.jobtracker.retirejobs set to false and run your job? That would confirm the problem is with retire.
          You can also enable completed job store, by setting mapreduce.jobtracker.persist.jobstatus.active to true and mapreduce.jobtracker.persist.jobstatus.hours to 1. Then, job details would be available for an hour.

          Show
          Amareshwari Sriramadasu added a comment - Can you bring up your cluster with mapreduce.jobtracker.retirejobs set to false and run your job? That would confirm the problem is with retire. You can also enable completed job store, by setting mapreduce.jobtracker.persist.jobstatus.active to true and mapreduce.jobtracker.persist.jobstatus.hours to 1. Then, job details would be available for an hour.
          Hide
          Aaron Kimball added a comment -

          This is indeed the issue. Setting mapreduce.jobtracker.retirejobs to false allows things to run correctly.

          If I remove that setting, then it fails. I think this indicates a need to do some sort of delay before retiring jobs. Otherwise the job client does not even display the counters in the stdout when the job is finished, which is an unexpected result.

          What is the best option going forward? Some that I can think of:

          • mapred-default.xml could enable the completed job store for 1 hr by default. Power users could override this if they need to
          • we could add some code to delay job retiring for some minimum amount of time (10 minutes?)
          • If the JobClient is still connected to the JT when the job finishes, the interaction could be modified to locally-cache a copy of the counters before retiring the job. Then existing references to the Job would have a guaranteed instance of the Counters available.
          • At the very least, Job.getCounters() needs a javadoc comment that specifies that it may return null. I think this is an incompatible change from 0.20. This suggestion is in addition to any of the above three.
          Show
          Aaron Kimball added a comment - This is indeed the issue. Setting mapreduce.jobtracker.retirejobs to false allows things to run correctly. If I remove that setting, then it fails. I think this indicates a need to do some sort of delay before retiring jobs. Otherwise the job client does not even display the counters in the stdout when the job is finished, which is an unexpected result. What is the best option going forward? Some that I can think of: mapred-default.xml could enable the completed job store for 1 hr by default. Power users could override this if they need to we could add some code to delay job retiring for some minimum amount of time (10 minutes?) If the JobClient is still connected to the JT when the job finishes, the interaction could be modified to locally-cache a copy of the counters before retiring the job. Then existing references to the Job would have a guaranteed instance of the Counters available. At the very least, Job.getCounters() needs a javadoc comment that specifies that it may return null. I think this is an incompatible change from 0.20. This suggestion is in addition to any of the above three.
          Hide
          Tom White added a comment -

          According to a comment in JobTracker#retireJob() mapreduce.jobtracker.retirejobs is "primarily for testing" (I assume because if you set it to true on a real installation the JT eventually runs out of memory?).

          I think enabling the job completed store for 1 hour is the most natural workaround. Here's a patch for enabling it by default. The patch also adds javadoc to Job#getCounters().

          Aaron, does this patch fix the issue for you?

          Show
          Tom White added a comment - According to a comment in JobTracker#retireJob() mapreduce.jobtracker.retirejobs is "primarily for testing" (I assume because if you set it to true on a real installation the JT eventually runs out of memory?). I think enabling the job completed store for 1 hour is the most natural workaround. Here's a patch for enabling it by default. The patch also adds javadoc to Job#getCounters(). Aaron, does this patch fix the issue for you?
          Hide
          Aaron Kimball added a comment -

          Tom,

          I am using the "combined/old-style" tarball of 0.21 rc 1. I applied your patch in the mapred/ directory and it applied, but I could not compile it because of:

          /home/aaron/Desktop/hadoop-0.21.0/mapred/build.xml:24: Cannot find build-utils.xml imported from /home/aaron/Desktop/hadoop-0.21.0/mapred/build.xml
          

          Is that an issue with the way you produce the combined tarball? Or a more general release bug that prevents it from self-hosting?

          The patch itself looks good though: +1
          I changed the two affected settings in my mapred-site.xml file and my job succeeded.

          Thanks!

          • Aaron
          Show
          Aaron Kimball added a comment - Tom, I am using the "combined/old-style" tarball of 0.21 rc 1. I applied your patch in the mapred/ directory and it applied, but I could not compile it because of: /home/aaron/Desktop/hadoop-0.21.0/mapred/build.xml:24: Cannot find build-utils.xml imported from /home/aaron/Desktop/hadoop-0.21.0/mapred/build.xml Is that an issue with the way you produce the combined tarball? Or a more general release bug that prevents it from self-hosting? The patch itself looks good though: +1 I changed the two affected settings in my mapred-site.xml file and my job succeeded. Thanks! Aaron
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12448971/MAPREDUCE-1920.patch
          against trunk revision 961578.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/293/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/293/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/293/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/293/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448971/MAPREDUCE-1920.patch against trunk revision 961578. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/293/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/293/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/293/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/293/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          According to a comment in JobTracker#retireJob() mapreduce.jobtracker.retirejobs is "primarily for testing" (I assume because if you set it to true on a real installation the JT eventually runs out of memory?).

          Yes. The configuration is only for testing. If mapreduce.jobtracker.retirejobs is set to false, the jobs will never be retired.

          Currently, JobTracker maintains a retired job cache which holds JobStatus of the retired jobs, but it does not hold counters. If we enable completed job store by default, the data(JobStatus) will be duplicated. I think we should leave the default configuration as is and let users enable completed job store if they are interested in counters. Also, we should mark MAPREDUCE-870 an incompatible change and update the release note.
          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - According to a comment in JobTracker#retireJob() mapreduce.jobtracker.retirejobs is "primarily for testing" (I assume because if you set it to true on a real installation the JT eventually runs out of memory?). Yes. The configuration is only for testing. If mapreduce.jobtracker.retirejobs is set to false, the jobs will never be retired. Currently, JobTracker maintains a retired job cache which holds JobStatus of the retired jobs, but it does not hold counters. If we enable completed job store by default, the data(JobStatus) will be duplicated. I think we should leave the default configuration as is and let users enable completed job store if they are interested in counters. Also, we should mark MAPREDUCE-870 an incompatible change and update the release note. Thoughts?
          Hide
          Tom White added a comment -

          Actually, I would do it the other way round. Users expect to be able to get counters from jobs they have just run, as witnessed by Aaron's experience that led to this bug (also http://lucene.472066.n3.nabble.com/Hadoop-0-21-job-getCounters-returns-null-td947190.html). I would rather have the default configuration work as expected, and advanced users can turn off the job store if they don't want to use it. Does that sound reasonable?

          Show
          Tom White added a comment - Actually, I would do it the other way round. Users expect to be able to get counters from jobs they have just run, as witnessed by Aaron's experience that led to this bug (also http://lucene.472066.n3.nabble.com/Hadoop-0-21-job-getCounters-returns-null-td947190.html ). I would rather have the default configuration work as expected, and advanced users can turn off the job store if they don't want to use it. Does that sound reasonable?
          Hide
          Amareshwari Sriramadasu added a comment -

          I was thinking whether we should disable retited jobs cache, if we enable completed job store by default, to remove duplicate data storage. But, now i feel we can enable both, because retired job cache is served from memory, whereas completed job store is served from file system; and clients are served from retired jobs cache first, if not found in the cache, then they are served from completed job store.

          Attached patch looks fine to me.

          Show
          Amareshwari Sriramadasu added a comment - I was thinking whether we should disable retited jobs cache, if we enable completed job store by default, to remove duplicate data storage. But, now i feel we can enable both, because retired job cache is served from memory, whereas completed job store is served from file system; and clients are served from retired jobs cache first, if not found in the cache, then they are served from completed job store. Attached patch looks fine to me.
          Hide
          Tom White added a comment -

          Re-running through Hudson.

          Show
          Tom White added a comment - Re-running through Hudson.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12448971/MAPREDUCE-1920.patch
          against trunk revision 980316.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/601/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/601/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/601/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/601/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448971/MAPREDUCE-1920.patch against trunk revision 980316. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/601/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/601/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/601/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/601/console This message is automatically generated.
          Hide
          Tom White added a comment -

          New patch which fixes failing unit tests.

          Show
          Tom White added a comment - New patch which fixes failing unit tests.
          Hide
          Amareshwari Sriramadasu added a comment -

          I see that patch adds conf.setUser(UserGroupInformation.getCurrentUser().getUserName()) in JobConf for many testcases. I did not understand why is that needed here. I ran TestTrackerReservation and TestClusterStatus with the attached patch, the tests are still failing.

          Show
          Amareshwari Sriramadasu added a comment - I see that patch adds conf.setUser(UserGroupInformation.getCurrentUser().getUserName()) in JobConf for many testcases. I did not understand why is that needed here. I ran TestTrackerReservation and TestClusterStatus with the attached patch, the tests are still failing.
          Hide
          Tom White added a comment -

          Thanks for taking a look, Amareshwari. With the first patch I get the following failure for TestTrackerReservation:

          Testcase: testTaskTrackerReservation took 0.431 sec
                  Caused an ERROR
          null
          java.lang.NullPointerException
                  at org.apache.hadoop.io.Text.encode(Text.java:396)
                  at org.apache.hadoop.io.Text.encode(Text.java:377)
                  at org.apache.hadoop.io.Text.writeString(Text.java:417)
                  at org.apache.hadoop.mapreduce.JobStatus.write(JobStatus.java:339)
                  at org.apache.hadoop.mapred.CompletedJobStatusStore.store(CompletedJobStatusStore.java:178)
                  at org.apache.hadoop.mapred.JobTracker.storeCompletedJob(JobTracker.java:3427)
                  at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:3344)
                  at org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2889)
                  at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2771)
                  at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1231)
                  at org.apache.hadoop.mapred.FakeObjectUtilities$FakeJobInProgress.finishTask(FakeObjectUtilities.java:186)
                  at org.apache.hadoop.mapred.TestTrackerReservation.testTaskTrackerReservation(TestTrackerReservation.java:138)
                  at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
                  at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
                  at junit.extensions.TestSetup.run(TestSetup.java:27)
          

          This is caused by a null user name. The second patch sets the user for the job, and passes for me.

          What failure are you getting with the second patch?

          Show
          Tom White added a comment - Thanks for taking a look, Amareshwari. With the first patch I get the following failure for TestTrackerReservation: Testcase: testTaskTrackerReservation took 0.431 sec Caused an ERROR null java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:396) at org.apache.hadoop.io.Text.encode(Text.java:377) at org.apache.hadoop.io.Text.writeString(Text.java:417) at org.apache.hadoop.mapreduce.JobStatus.write(JobStatus.java:339) at org.apache.hadoop.mapred.CompletedJobStatusStore.store(CompletedJobStatusStore.java:178) at org.apache.hadoop.mapred.JobTracker.storeCompletedJob(JobTracker.java:3427) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:3344) at org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2889) at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2771) at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1231) at org.apache.hadoop.mapred.FakeObjectUtilities$FakeJobInProgress.finishTask(FakeObjectUtilities.java:186) at org.apache.hadoop.mapred.TestTrackerReservation.testTaskTrackerReservation(TestTrackerReservation.java:138) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:23) at junit.extensions.TestSetup.run(TestSetup.java:27) This is caused by a null user name. The second patch sets the user for the job, and passes for me. What failure are you getting with the second patch?
          Hide
          Amareshwari Sriramadasu added a comment -

          TestTrackerReservation failed with following exception on my machine:

          Testcase: unknown took 0 sec
                  Caused an ERROR
          CompletedJobStatusStore mkdirs failed to create /jobtracker/jobsInfo
          java.io.IOException: CompletedJobStatusStore mkdirs failed to create /jobtracker/jobsInfo
                  at org.apache.hadoop.mapred.CompletedJobStatusStore.<init>(CompletedJobStatusStore.java:83)
                  at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1626)
                  at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1373)
                  at org.apache.hadoop.mapred.FakeObjectUtilities$FakeJobTracker.<init>(FakeObjectUtilities.java:59)
                  at org.apache.hadoop.mapred.TestTrackerReservation$FakeJobTracker.<init>(TestTrackerReservation.java:48)
                  at org.apache.hadoop.mapred.TestTrackerReservation$1.setUp(TestTrackerReservation.java:64)
                  at junit.extensions.TestSetup$1.protect(TestSetup.java:22)
                  at junit.extensions.TestSetup.run(TestSetup.java:27)
          

          TestClusterStatus also fails with similar error.

          Show
          Amareshwari Sriramadasu added a comment - TestTrackerReservation failed with following exception on my machine: Testcase: unknown took 0 sec Caused an ERROR CompletedJobStatusStore mkdirs failed to create /jobtracker/jobsInfo java.io.IOException: CompletedJobStatusStore mkdirs failed to create /jobtracker/jobsInfo at org.apache.hadoop.mapred.CompletedJobStatusStore.<init>(CompletedJobStatusStore.java:83) at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1626) at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1373) at org.apache.hadoop.mapred.FakeObjectUtilities$FakeJobTracker.<init>(FakeObjectUtilities.java:59) at org.apache.hadoop.mapred.TestTrackerReservation$FakeJobTracker.<init>(TestTrackerReservation.java:48) at org.apache.hadoop.mapred.TestTrackerReservation$1.setUp(TestTrackerReservation.java:64) at junit.extensions.TestSetup$1.protect(TestSetup.java:22) at junit.extensions.TestSetup.run(TestSetup.java:27) TestClusterStatus also fails with similar error.
          Hide
          Amareshwari Sriramadasu added a comment -

          I see the same errors with earlier patch also. Console output from above patch build also confirms that.
          Tom, do you have any different setup on your machine?

          Show
          Amareshwari Sriramadasu added a comment - I see the same errors with earlier patch also. Console output from above patch build also confirms that. Tom, do you have any different setup on your machine?
          Hide
          Tom White added a comment -

          The latest patch hasn't been run by Hudson yet, so let's see if it takes this time. I'll see if I can reproduce this error too (I ran it on a Mac when it passed).

          Show
          Tom White added a comment - The latest patch hasn't been run by Hudson yet, so let's see if it takes this time. I'll see if I can reproduce this error too (I ran it on a Mac when it passed).
          Hide
          Tom White added a comment -

          The failures occur because the tests are trying to create /jobtracker/jobsInfo on the local file system. I've now fixed the tests that do this (and verified that they pass). I've also fixed TestMapredSystemDir, which was failing for a different reason.

          Here are the results of test-patch:

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 27 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          

          I think this is ready to go now. Amareshwari, would you be able to try this patch? Thanks!

          Show
          Tom White added a comment - The failures occur because the tests are trying to create /jobtracker/jobsInfo on the local file system. I've now fixed the tests that do this (and verified that they pass). I've also fixed TestMapredSystemDir, which was failing for a different reason. Here are the results of test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 27 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. I think this is ready to go now. Amareshwari, would you be able to try this patch? Thanks!
          Hide
          Amareshwari Sriramadasu added a comment -

          +1
          Latest patch looks fine.

          Show
          Amareshwari Sriramadasu added a comment - +1 Latest patch looks fine.
          Hide
          Amareshwari Sriramadasu added a comment -

          Tried running ant test with latest patch, but I see a lot test timeouts. The tests are timing out because of the same error: "CompletedJobStore trying to create /jobtracker/jobsInfo on the local file system" and MAPREDUCE-1366. All the tests which work on local FileSystem are timing out. Will post the list of failing tests once my ant test run finishes.

          Show
          Amareshwari Sriramadasu added a comment - Tried running ant test with latest patch, but I see a lot test timeouts. The tests are timing out because of the same error: "CompletedJobStore trying to create /jobtracker/jobsInfo on the local file system" and MAPREDUCE-1366 . All the tests which work on local FileSystem are timing out. Will post the list of failing tests once my ant test run finishes.
          Hide
          Amareshwari Sriramadasu added a comment -

          Tests that timed out till now:
          TestAdminOperationsProtocolWithServiceAuthorization
          TestClusterMRNotification
          TestDebugScript
          TestEmptyJob
          TestIsolationRunner
          TestJobCleanup
          TestJobHistory
          TestJobHistoryParsing
          TestJobInProgress
          TestJobInProgressListener
          TestJobKillAndFail
          TestJobQueueClient
          TestJvmReuse
          TestKillSubProcesses
          TestMRWithDistributedCache
          TestMapredHeartbeat
          TestMiniMRBringup

          Tests that failed:
          TestJobTrackerStart
          TestKillCompletedJob

          my local ant test run is still running. So, more tests to be added to the above list.

          Shall we fix MiniMRCluster to set a persist dir in local file system if fileSystem passed is local, instead of fixing these individual tests?
          Or shall we disable completed job store for the unit tests by adding conf in src/test/mapred-site.xml (similar to disabling retire jobs) as TestJobStatusPersistency anyways tests the functionality of completedJobStore?

          Show
          Amareshwari Sriramadasu added a comment - Tests that timed out till now: TestAdminOperationsProtocolWithServiceAuthorization TestClusterMRNotification TestDebugScript TestEmptyJob TestIsolationRunner TestJobCleanup TestJobHistory TestJobHistoryParsing TestJobInProgress TestJobInProgressListener TestJobKillAndFail TestJobQueueClient TestJvmReuse TestKillSubProcesses TestMRWithDistributedCache TestMapredHeartbeat TestMiniMRBringup Tests that failed: TestJobTrackerStart TestKillCompletedJob my local ant test run is still running. So, more tests to be added to the above list. Shall we fix MiniMRCluster to set a persist dir in local file system if fileSystem passed is local, instead of fixing these individual tests? Or shall we disable completed job store for the unit tests by adding conf in src/test/mapred-site.xml (similar to disabling retire jobs) as TestJobStatusPersistency anyways tests the functionality of completedJobStore?
          Hide
          Tom White added a comment -

          > Or shall we disable completed job store for the unit tests by adding conf in src/test/mapred-site.xml (similar to disabling retire jobs) as TestJobStatusPersistency anyways tests the functionality of completedJobStore?

          I think this is a much better way of doing it. Thanks for the suggestion. I'll prepare a patch.

          Show
          Tom White added a comment - > Or shall we disable completed job store for the unit tests by adding conf in src/test/mapred-site.xml (similar to disabling retire jobs) as TestJobStatusPersistency anyways tests the functionality of completedJobStore? I think this is a much better way of doing it. Thanks for the suggestion. I'll prepare a patch.
          Hide
          Tom White added a comment -

          This patch (based on the first one) sets mapreduce.jobtracker.persist.jobstatus.active to false in the test mapred-site.xml. It passes all unit tests (I ran it on Linux). Here's the output of test-patch:

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec] 
          
          Show
          Tom White added a comment - This patch (based on the first one) sets mapreduce.jobtracker.persist.jobstatus.active to false in the test mapred-site.xml. It passes all unit tests (I ran it on Linux). Here's the output of test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec]
          Hide
          Amareshwari Sriramadasu added a comment -

          I just committed this to trunk and branch 0.21.

          Thanks Tom!

          Show
          Amareshwari Sriramadasu added a comment - I just committed this to trunk and branch 0.21. Thanks Tom!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )

            People

            • Assignee:
              Tom White
              Reporter:
              Aaron Kimball
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development