Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-153

TestJobInProgressListener sometimes timesout

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Only one MR cluster is brought up and hence there is no scope of jobid clashing.

      Description

      It times out with "Could not find /taskTracker/jobcache/jobid/work in any of the configured local directories".

      1. MAPREDUCE-153-v1.0.patch
        14 kB
        Amar Kamat
      2. MAPREDUCE-153-v1.1.patch
        19 kB
        Amar Kamat
      3. MAPREDUCE-153-v1.1-branch-0.20.patch
        17 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Amar Kamat added a comment -

          The only error message I could see was

           [junit] org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200905060050_0001/work in any of the configured local directories
              [junit]     at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381)
              [junit]     at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
              [junit]     at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.localizeTask(TaskTracker.java:1888)
              [junit]     at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.launchTask(TaskTracker.java:2001)
              [junit]     at org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:880)
              [junit]     at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:874)
              [junit]     at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1739)
              [junit]     at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
              [junit]     at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1704)
          
          Show
          Amar Kamat added a comment - The only error message I could see was [junit] org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200905060050_0001/work in any of the configured local directories [junit] at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381) [junit] at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) [junit] at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.localizeTask(TaskTracker.java:1888) [junit] at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.launchTask(TaskTracker.java:2001) [junit] at org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:880) [junit] at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:874) [junit] at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1739) [junit] at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97) [junit] at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1704)
          Hide
          Ravi Gummadi added a comment -

          I too observed the failure with trunk. Seem to be failing if we run 2 or 3 times.

          Show
          Ravi Gummadi added a comment - I too observed the failure with trunk. Seem to be failing if we run 2 or 3 times.
          Hide
          Amar Kamat added a comment -

          The problem occurs when the jobtrackers starts within the same minute and the job-id clashes. Attaching a patch that runs all the tests with one mapred cluster. The runtime for this test now is 1m3secs. Working on bringing it further down. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          This is just a testcase change and hence no ant tests required.

          Show
          Amar Kamat added a comment - The problem occurs when the jobtrackers starts within the same minute and the job-id clashes. Attaching a patch that runs all the tests with one mapred cluster. The runtime for this test now is 1m3secs. Working on bringing it further down. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. This is just a testcase change and hence no ant tests required.
          Hide
          Amar Kamat added a comment -

          Attaching a patch that adds 2 more tests

          1. Test listener events with 0 maps and 0 reducers with setup/cleanup
          2. Test listener events with 0 maps, 0 reducers and no setup/cleanup

          Broken down the main test into subtests and made sure the minimr is brought up once. Runtime of the testcase is now 1m19secs. This is a testcase only change and hence no ant test results are required.

          Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Currently investigating if unit tests can be written for this testcase.

          Show
          Amar Kamat added a comment - Attaching a patch that adds 2 more tests Test listener events with 0 maps and 0 reducers with setup/cleanup Test listener events with 0 maps, 0 reducers and no setup/cleanup Broken down the main test into subtests and made sure the minimr is brought up once. Runtime of the testcase is now 1m19secs. This is a testcase only change and hence no ant test results are required. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Currently investigating if unit tests can be written for this testcase.
          Hide
          Jothi Padmanabhan added a comment -

          A couple of minor suggestions:

          1. can we use NullOutputFormat for the tests so that we avoid doing any output promotions
          2. testQueuedJobKill can be done at the end – that way we could avoid one call to startInitializer
          Show
          Jothi Padmanabhan added a comment - A couple of minor suggestions: can we use NullOutputFormat for the tests so that we avoid doing any output promotions testQueuedJobKill can be done at the end – that way we could avoid one call to startInitializer
          Hide
          Amar Kamat added a comment -

          1. can we use NullOutputFormat for the tests so that we avoid doing any output promotions

          I think we can keep it as it is and change it in some jira that deals with UtilsForTests.

          2. testQueuedJobKill can be done at the end - that way we could avoid one call to startInitializer

          I think I did it on purpose. The reason is because I am creating only one mr cluster and that is shared across the testcases. I think its safe not to assume the state of initializer before running the testcase hence I forcefully stop/start the initializer. Its a thread start and stop calls.

          Show
          Amar Kamat added a comment - 1. can we use NullOutputFormat for the tests so that we avoid doing any output promotions I think we can keep it as it is and change it in some jira that deals with UtilsForTests. 2. testQueuedJobKill can be done at the end - that way we could avoid one call to startInitializer I think I did it on purpose. The reason is because I am creating only one mr cluster and that is shared across the testcases. I think its safe not to assume the state of initializer before running the testcase hence I forcefully stop/start the initializer. Its a thread start and stop calls.
          Hide
          Jothi Padmanabhan added a comment -

          The reason is because I am creating only one mr cluster and that is shared across the testcases. I think its safe not to assume the state of initializer before running the testcase hence I forcefully stop/start the initializer. Its a thread start and stop calls.

          I think this can be easily worked out. However, since the gain by removing one call to the initializer thread start/stop is not much, I am OK with the way things are.

          Show
          Jothi Padmanabhan added a comment - The reason is because I am creating only one mr cluster and that is shared across the testcases. I think its safe not to assume the state of initializer before running the testcase hence I forcefully stop/start the initializer. Its a thread start and stop calls. I think this can be easily worked out. However, since the gain by removing one call to the initializer thread start/stop is not much, I am OK with the way things are.
          Hide
          Sharad Agarwal added a comment -

          I committed this. Thanks Amar!

          Show
          Sharad Agarwal added a comment - I committed this. Thanks Amar!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/ )
          Hide
          Amar Kamat added a comment -

          Attaching a patch for branch-0.20.

          Show
          Amar Kamat added a comment - Attaching a patch for branch-0.20.

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Amar Kamat
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development