Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The following test classes are failing on branch-1: TestJobTrackerRestartWithLostTracker, TestJobTrackerSafeMode, TestMiniMRMapRedDebugScript, TestRecoveryManager, TestTaskTrackerLocalization.

      Recent MR changes, believe these tests were passing before these failures:

      • MAPREDUCE-4012 Hadoop Job setup error leaves no useful info to users. (tgrav
      • MAPREDUCE-1238. mapred metrics shows negative count of waiting maps and redu
      • MAPREDUCE-4017. Add jobname to jobsummary log (tgraves and Koji Noguchi via
      • MAPREDUCE-4003. log.index (No such file or directory) AND Task process exit

        Activity

        Hide
        Eli Collins added a comment -

        Here are details of the failures:

        Testcase: testRestartWithLostTracker took 163.223 sec
                FAILED
        Tracker killed while the jobtracker was down did not get lost upon restart expected:<0> but was:<1>
        junit.framework.AssertionFailedError: Tracker killed while the jobtracker was down did not get lost upon restart expected:<0> but was:<1>
                at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRecoveryWithLostTracker(TestJobTrackerRestartWithLostTracker.java:112)
                at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRestartWithLostTracker(TestJobTrackerRestartWithLostTracker.java:163)
        
        Testcase: testJobTrackerSafeMode took 99.434 sec
                FAILED
        JobTracker has opened up scheduling before all the trackers were recovered
        junit.framework.AssertionFailedError: JobTracker has opened up scheduling before all the trackers were recovered
                at org.apache.hadoop.mapred.TestJobTrackerSafeMode.testSafeMode(TestJobTrackerSafeMode.java:177)
                at org.apache.hadoop.mapred.TestJobTrackerSafeMode.testJobTrackerSafeMode(TestJobTrackerSafeMode.java:267)
        
        Testcase: testMapDebugScript took 89.257 sec
                FAILED
        null
        junit.framework.AssertionFailedError: null
                at org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript.testMapDebugScript(TestMiniMRMapRedDebugScript.java:212)
        
        Testcase: testRecoveryManager took 67.941 sec
                FAILED
        Recovery manager failed to tolerate job failures expected:<2> but was:<0>
        junit.framework.AssertionFailedError: Recovery manager failed to tolerate job failures expected:<2> but was:<0>
                at org.apache.hadoop.mapred.TestRecoveryManager.testRecoveryManager(TestRecoveryManager.java:250)
        
        Testcase: testRestartCount took 72.487 sec
                Caused an ERROR
        null
        java.lang.NullPointerException
                at org.apache.hadoop.mapred.TestRecoveryManager.testRestartCount(TestRecoveryManager.java:361)
        
        
        Testcase: testTrackerReinit took 0.038 sec
                Caused an ERROR
        Could not find any valid local directory for ttprivate/taskTracker/eli/jobcache/job_200907202331_0001/jobToken
        org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for ttprivate/taskTracker/eli/jobcache/job_200907202331_0001/jobToken
        
        Show
        Eli Collins added a comment - Here are details of the failures: Testcase: testRestartWithLostTracker took 163.223 sec FAILED Tracker killed while the jobtracker was down did not get lost upon restart expected:<0> but was:<1> junit.framework.AssertionFailedError: Tracker killed while the jobtracker was down did not get lost upon restart expected:<0> but was:<1> at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRecoveryWithLostTracker(TestJobTrackerRestartWithLostTracker.java:112) at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRestartWithLostTracker(TestJobTrackerRestartWithLostTracker.java:163) Testcase: testJobTrackerSafeMode took 99.434 sec FAILED JobTracker has opened up scheduling before all the trackers were recovered junit.framework.AssertionFailedError: JobTracker has opened up scheduling before all the trackers were recovered at org.apache.hadoop.mapred.TestJobTrackerSafeMode.testSafeMode(TestJobTrackerSafeMode.java:177) at org.apache.hadoop.mapred.TestJobTrackerSafeMode.testJobTrackerSafeMode(TestJobTrackerSafeMode.java:267) Testcase: testMapDebugScript took 89.257 sec FAILED null junit.framework.AssertionFailedError: null at org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript.testMapDebugScript(TestMiniMRMapRedDebugScript.java:212) Testcase: testRecoveryManager took 67.941 sec FAILED Recovery manager failed to tolerate job failures expected:<2> but was:<0> junit.framework.AssertionFailedError: Recovery manager failed to tolerate job failures expected:<2> but was:<0> at org.apache.hadoop.mapred.TestRecoveryManager.testRecoveryManager(TestRecoveryManager.java:250) Testcase: testRestartCount took 72.487 sec Caused an ERROR null java.lang.NullPointerException at org.apache.hadoop.mapred.TestRecoveryManager.testRestartCount(TestRecoveryManager.java:361) Testcase: testTrackerReinit took 0.038 sec Caused an ERROR Could not find any valid local directory for ttprivate/taskTracker/eli/jobcache/job_200907202331_0001/jobToken org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for ttprivate/taskTracker/eli/jobcache/job_200907202331_0001/jobToken
        Hide
        Thomas Graves added a comment -

        All of these tests have @Ignore in them and if run individually have been failing for a long time before the listed jiras.

        I was running them with: ant test -Dtestcase=foo so I'm not sure why that isn't honoring the @Ignore.

        Checking a jenkins output of running all the tests shows:

        [junit] Running org.apache.hadoop.mapred.TestRecoveryManager
        [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.043 sec
        [junit] Running org.apache.hadoop.mapred.TestJobTrackerSafeMode
        [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.042 sec
        ...

        I'm running a full test run now to verify that holds.

        Show
        Thomas Graves added a comment - All of these tests have @Ignore in them and if run individually have been failing for a long time before the listed jiras. I was running them with: ant test -Dtestcase=foo so I'm not sure why that isn't honoring the @Ignore. Checking a jenkins output of running all the tests shows: [junit] Running org.apache.hadoop.mapred.TestRecoveryManager [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.043 sec [junit] Running org.apache.hadoop.mapred.TestJobTrackerSafeMode [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.042 sec ... I'm running a full test run now to verify that holds.
        Hide
        Thomas Graves added a comment -

        I ran the tests on the 1.0.2 branch and the tests still fail:

        [junit] Test org.apache.hadoop.mapred.TestJobTrackerRestart FAILED (timeout)
        [junit] Running org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker
        [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 126.78 sec
        [junit] Test org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED
        [junit] Running org.apache.hadoop.mapred.TestJobTrackerSafeMode
        [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 75.936 sec

        Note that was using ant 1.8.2, I switched to use ant 1.7.1 and it honored the ignore:

        Testsuite: org.apache.hadoop.mapred.TestRecoveryManager
        Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.005 sec

        Eli, you might check the version of ant you are using. Please reopen if you think that isn't it.

        Show
        Thomas Graves added a comment - I ran the tests on the 1.0.2 branch and the tests still fail: [junit] Test org.apache.hadoop.mapred.TestJobTrackerRestart FAILED (timeout) [junit] Running org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 126.78 sec [junit] Test org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED [junit] Running org.apache.hadoop.mapred.TestJobTrackerSafeMode [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 75.936 sec Note that was using ant 1.8.2, I switched to use ant 1.7.1 and it honored the ignore: Testsuite: org.apache.hadoop.mapred.TestRecoveryManager Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.005 sec Eli, you might check the version of ant you are using. Please reopen if you think that isn't it.
        Hide
        Eli Collins added a comment -

        Note that was using ant 1.8.2, I switched to use ant 1.7.1 and it honored the ignore:

        Thanks Thomas. I'm using ant 1.8.2 as well, this a known issue with ant?

        Show
        Eli Collins added a comment - Note that was using ant 1.8.2, I switched to use ant 1.7.1 and it honored the ignore: Thanks Thomas. I'm using ant 1.8.2 as well, this a known issue with ant?
        Hide
        Eli Collins added a comment -

        Hm, actually, @Ignore works for me on hdfs tests using the same version of ant, wonder what's different about MR here.

        Show
        Eli Collins added a comment - Hm, actually, @Ignore works for me on hdfs tests using the same version of ant, wonder what's different about MR here.
        Hide
        Thomas Graves added a comment -

        I remember the ant version thing coming up on the mailing list before but don't remember details.

        perhaps its a dependency version issue or something in MR if hdfs works, I'll try to poke a bit more to see if I can find anything.

        Show
        Thomas Graves added a comment - I remember the ant version thing coming up on the mailing list before but don't remember details. perhaps its a dependency version issue or something in MR if hdfs works, I'll try to poke a bit more to see if I can find anything.
        Hide
        Tom White added a comment -

        I think this is happening because the tests are JUnit 3 tests (extending TestCase), but @Ignore is a JUnit 4 construct. Converting these tests to JUnit 4 tests should fix the problem. (I'm not sure what changed to cause these tests to start failing though.)

        Show
        Tom White added a comment - I think this is happening because the tests are JUnit 3 tests (extending TestCase), but @Ignore is a JUnit 4 construct. Converting these tests to JUnit 4 tests should fix the problem. (I'm not sure what changed to cause these tests to start failing though.)

          People

          • Assignee:
            Thomas Graves
            Reporter:
            Eli Collins
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development