Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The following test classes are failing on branch-1: TestJobTrackerRestartWithLostTracker, TestJobTrackerSafeMode, TestMiniMRMapRedDebugScript, TestRecoveryManager, TestTaskTrackerLocalization.

      Recent MR changes, believe these tests were passing before these failures:

      • MAPREDUCE-4012 Hadoop Job setup error leaves no useful info to users. (tgrav
      • MAPREDUCE-1238. mapred metrics shows negative count of waiting maps and redu
      • MAPREDUCE-4017. Add jobname to jobsummary log (tgraves and Koji Noguchi via
      • MAPREDUCE-4003. log.index (No such file or directory) AND Task process exit

        Activity

        Eli Collins created issue -
        Hide
        Eli Collins added a comment -

        Here are details of the failures:

        Testcase: testRestartWithLostTracker took 163.223 sec
                FAILED
        Tracker killed while the jobtracker was down did not get lost upon restart expected:<0> but was:<1>
        junit.framework.AssertionFailedError: Tracker killed while the jobtracker was down did not get lost upon restart expected:<0> but was:<1>
                at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRecoveryWithLostTracker(TestJobTrackerRestartWithLostTracker.java:112)
                at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRestartWithLostTracker(TestJobTrackerRestartWithLostTracker.java:163)
        
        Testcase: testJobTrackerSafeMode took 99.434 sec
                FAILED
        JobTracker has opened up scheduling before all the trackers were recovered
        junit.framework.AssertionFailedError: JobTracker has opened up scheduling before all the trackers were recovered
                at org.apache.hadoop.mapred.TestJobTrackerSafeMode.testSafeMode(TestJobTrackerSafeMode.java:177)
                at org.apache.hadoop.mapred.TestJobTrackerSafeMode.testJobTrackerSafeMode(TestJobTrackerSafeMode.java:267)
        
        Testcase: testMapDebugScript took 89.257 sec
                FAILED
        null
        junit.framework.AssertionFailedError: null
                at org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript.testMapDebugScript(TestMiniMRMapRedDebugScript.java:212)
        
        Testcase: testRecoveryManager took 67.941 sec
                FAILED
        Recovery manager failed to tolerate job failures expected:<2> but was:<0>
        junit.framework.AssertionFailedError: Recovery manager failed to tolerate job failures expected:<2> but was:<0>
                at org.apache.hadoop.mapred.TestRecoveryManager.testRecoveryManager(TestRecoveryManager.java:250)
        
        Testcase: testRestartCount took 72.487 sec
                Caused an ERROR
        null
        java.lang.NullPointerException
                at org.apache.hadoop.mapred.TestRecoveryManager.testRestartCount(TestRecoveryManager.java:361)
        
        
        Testcase: testTrackerReinit took 0.038 sec
                Caused an ERROR
        Could not find any valid local directory for ttprivate/taskTracker/eli/jobcache/job_200907202331_0001/jobToken
        org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for ttprivate/taskTracker/eli/jobcache/job_200907202331_0001/jobToken
        
        Show
        Eli Collins added a comment - Here are details of the failures: Testcase: testRestartWithLostTracker took 163.223 sec FAILED Tracker killed while the jobtracker was down did not get lost upon restart expected:<0> but was:<1> junit.framework.AssertionFailedError: Tracker killed while the jobtracker was down did not get lost upon restart expected:<0> but was:<1> at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRecoveryWithLostTracker(TestJobTrackerRestartWithLostTracker.java:112) at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRestartWithLostTracker(TestJobTrackerRestartWithLostTracker.java:163) Testcase: testJobTrackerSafeMode took 99.434 sec FAILED JobTracker has opened up scheduling before all the trackers were recovered junit.framework.AssertionFailedError: JobTracker has opened up scheduling before all the trackers were recovered at org.apache.hadoop.mapred.TestJobTrackerSafeMode.testSafeMode(TestJobTrackerSafeMode.java:177) at org.apache.hadoop.mapred.TestJobTrackerSafeMode.testJobTrackerSafeMode(TestJobTrackerSafeMode.java:267) Testcase: testMapDebugScript took 89.257 sec FAILED null junit.framework.AssertionFailedError: null at org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript.testMapDebugScript(TestMiniMRMapRedDebugScript.java:212) Testcase: testRecoveryManager took 67.941 sec FAILED Recovery manager failed to tolerate job failures expected:<2> but was:<0> junit.framework.AssertionFailedError: Recovery manager failed to tolerate job failures expected:<2> but was:<0> at org.apache.hadoop.mapred.TestRecoveryManager.testRecoveryManager(TestRecoveryManager.java:250) Testcase: testRestartCount took 72.487 sec Caused an ERROR null java.lang.NullPointerException at org.apache.hadoop.mapred.TestRecoveryManager.testRestartCount(TestRecoveryManager.java:361) Testcase: testTrackerReinit took 0.038 sec Caused an ERROR Could not find any valid local directory for ttprivate/taskTracker/eli/jobcache/job_200907202331_0001/jobToken org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for ttprivate/taskTracker/eli/jobcache/job_200907202331_0001/jobToken
        Thomas Graves made changes -
        Field Original Value New Value
        Assignee Thomas Graves [ tgraves ]
        Hide
        Thomas Graves added a comment -

        All of these tests have @Ignore in them and if run individually have been failing for a long time before the listed jiras.

        I was running them with: ant test -Dtestcase=foo so I'm not sure why that isn't honoring the @Ignore.

        Checking a jenkins output of running all the tests shows:

        [junit] Running org.apache.hadoop.mapred.TestRecoveryManager
        [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.043 sec
        [junit] Running org.apache.hadoop.mapred.TestJobTrackerSafeMode
        [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.042 sec
        ...

        I'm running a full test run now to verify that holds.

        Show
        Thomas Graves added a comment - All of these tests have @Ignore in them and if run individually have been failing for a long time before the listed jiras. I was running them with: ant test -Dtestcase=foo so I'm not sure why that isn't honoring the @Ignore. Checking a jenkins output of running all the tests shows: [junit] Running org.apache.hadoop.mapred.TestRecoveryManager [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.043 sec [junit] Running org.apache.hadoop.mapred.TestJobTrackerSafeMode [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.042 sec ... I'm running a full test run now to verify that holds.
        Hide
        Thomas Graves added a comment -

        I ran the tests on the 1.0.2 branch and the tests still fail:

        [junit] Test org.apache.hadoop.mapred.TestJobTrackerRestart FAILED (timeout)
        [junit] Running org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker
        [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 126.78 sec
        [junit] Test org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED
        [junit] Running org.apache.hadoop.mapred.TestJobTrackerSafeMode
        [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 75.936 sec

        Note that was using ant 1.8.2, I switched to use ant 1.7.1 and it honored the ignore:

        Testsuite: org.apache.hadoop.mapred.TestRecoveryManager
        Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.005 sec

        Eli, you might check the version of ant you are using. Please reopen if you think that isn't it.

        Show
        Thomas Graves added a comment - I ran the tests on the 1.0.2 branch and the tests still fail: [junit] Test org.apache.hadoop.mapred.TestJobTrackerRestart FAILED (timeout) [junit] Running org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 126.78 sec [junit] Test org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED [junit] Running org.apache.hadoop.mapred.TestJobTrackerSafeMode [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 75.936 sec Note that was using ant 1.8.2, I switched to use ant 1.7.1 and it honored the ignore: Testsuite: org.apache.hadoop.mapred.TestRecoveryManager Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.005 sec Eli, you might check the version of ant you are using. Please reopen if you think that isn't it.
        Thomas Graves made changes -
        Resolution Invalid [ 6 ]
        Status Open [ 1 ] Resolved [ 5 ]
        Hide
        Eli Collins added a comment -

        Note that was using ant 1.8.2, I switched to use ant 1.7.1 and it honored the ignore:

        Thanks Thomas. I'm using ant 1.8.2 as well, this a known issue with ant?

        Show
        Eli Collins added a comment - Note that was using ant 1.8.2, I switched to use ant 1.7.1 and it honored the ignore: Thanks Thomas. I'm using ant 1.8.2 as well, this a known issue with ant?
        Hide
        Eli Collins added a comment -

        Hm, actually, @Ignore works for me on hdfs tests using the same version of ant, wonder what's different about MR here.

        Show
        Eli Collins added a comment - Hm, actually, @Ignore works for me on hdfs tests using the same version of ant, wonder what's different about MR here.
        Hide
        Thomas Graves added a comment -

        I remember the ant version thing coming up on the mailing list before but don't remember details.

        perhaps its a dependency version issue or something in MR if hdfs works, I'll try to poke a bit more to see if I can find anything.

        Show
        Thomas Graves added a comment - I remember the ant version thing coming up on the mailing list before but don't remember details. perhaps its a dependency version issue or something in MR if hdfs works, I'll try to poke a bit more to see if I can find anything.
        Hide
        Tom White added a comment -

        I think this is happening because the tests are JUnit 3 tests (extending TestCase), but @Ignore is a JUnit 4 construct. Converting these tests to JUnit 4 tests should fix the problem. (I'm not sure what changed to cause these tests to start failing though.)

        Show
        Tom White added a comment - I think this is happening because the tests are JUnit 3 tests (extending TestCase), but @Ignore is a JUnit 4 construct. Converting these tests to JUnit 4 tests should fix the problem. (I'm not sure what changed to cause these tests to start failing though.)
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        14h 57m 1 Thomas Graves 12/Apr/12 22:05

          People

          • Assignee:
            Thomas Graves
            Reporter:
            Eli Collins
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development