Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      TestQueueCapacities in trunk currently times out with message failed to fetch map-outputs. Stack trace is:

      2009-05-19 10:54:01,162 WARN org.apache.hadoop.mapred.ReduceTask: \
        attempt_200905191053_0001_r_000011_0 copy failed: attempt_200905191053_0001_m_000000_0 from localhost
      2009-05-19 10:54:01,163 WARN org.apache.hadoop.mapred.ReduceTask: java.io.FileNotFoundException: \
        http://localhost:54203/mapOutput?job=job_200905191053_0001&map=attempt_200905191053_0001_m_000000_0&reduce=11
              at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1241)
              at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1436)
              at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1353)
              at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1267)
              at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1199)
      
      1. thread-dump.txt
        106 kB
        steve_l
      2. HADOOP-5869-3.patch
        2 kB
        Sreekanth Ramakrishnan
      3. HADOOP-5869-3.20.patch
        2 kB
        Hemanth Yamijala
      4. HADOOP-5869-2.patch
        3 kB
        Sreekanth Ramakrishnan
      5. HADOOP-5869-1.patch
        1 kB
        Sreekanth Ramakrishnan
      6. hadoop-5869.patch
        0.5 kB
        Giridharan Kesavan

        Issue Links

          Activity

          Hide
          Hemanth Yamijala added a comment -

          HADOOP-6064 is the duplicate. I am resolving this issue because rather than fixing a different problem that shows up every time, I think it is better to just simplify this testcase drastically. That's the focus on HADOOP-6064. The other reason is that HADOOP-5869 is still a valid fix, and solves a problem correctly - just that now new symptoms are showing up. Please watch HADOOP-6064 if you are interested.

          Show
          Hemanth Yamijala added a comment - HADOOP-6064 is the duplicate. I am resolving this issue because rather than fixing a different problem that shows up every time, I think it is better to just simplify this testcase drastically. That's the focus on HADOOP-6064 . The other reason is that HADOOP-5869 is still a valid fix, and solves a problem correctly - just that now new symptoms are showing up. Please watch HADOOP-6064 if you are interested.
          Hide
          gary murry added a comment - - edited

          TestQueueCapacities timed out again. (http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/868/).

          Show
          gary murry added a comment - - edited TestQueueCapacities timed out again. ( http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/868/ ).
          Hide
          Hemanth Yamijala added a comment -

          I committed this to trunk and the 0.20 branch. Thanks, Sreekanth !

          Show
          Hemanth Yamijala added a comment - I committed this to trunk and the 0.20 branch. Thanks, Sreekanth !
          Hide
          Hemanth Yamijala added a comment -

          Patch file for Hadoop 0.20. Only changes path to the test directory.

          Show
          Hemanth Yamijala added a comment - Patch file for Hadoop 0.20. Only changes path to the test directory.
          Hide
          Sreekanth Ramakrishnan added a comment -

          The failure in HDFS proxy is not related to the changes made in this patch. The test case fails with IOException in Configuration.config, whereas patch does not change anything related to configuration.

          Show
          Sreekanth Ramakrishnan added a comment - The failure in HDFS proxy is not related to the changes made in this patch. The test case fails with IOException in Configuration.config, whereas patch does not change anything related to configuration.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12410207/HADOOP-5869-3.patch
          against trunk revision 783059.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/486/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/486/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/486/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/486/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12410207/HADOOP-5869-3.patch against trunk revision 783059. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/486/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/486/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/486/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/486/console This message is automatically generated.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching patch incorporating Hemanths Comments.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching patch incorporating Hemanths Comments.
          Hide
          Hemanth Yamijala added a comment -

          I had a chat with Sreekanth on this one. We think the change in MiniMRCluster to remove waitTaskTrackers is an improvement that is not required for this bug fix. At the same time it is a useful improvement to consider (though the fix is possibly wrong and needs change). We will move this change to a separate JIRA to discuss further on the MiniMRCluster changes.

          Also, I don't see this patch on Hudson's queue, though its been a while since it was submitted. I would recommend that tests be run locally on the new patch to verify it is working and then I can commit it. The bug has been open for a way too long time.

          Show
          Hemanth Yamijala added a comment - I had a chat with Sreekanth on this one. We think the change in MiniMRCluster to remove waitTaskTrackers is an improvement that is not required for this bug fix. At the same time it is a useful improvement to consider (though the fix is possibly wrong and needs change). We will move this change to a separate JIRA to discuss further on the MiniMRCluster changes. Also, I don't see this patch on Hudson's queue, though its been a while since it was submitted. I would recommend that tests be run locally on the new patch to verify it is working and then I can commit it. The bug has been open for a way too long time.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching new patch correcting few issues:

          • The tests were failing randomly because of a timing issue with regards to speculative tasks being launched. The speculative execution is currently disabled in this patch.
          • The tests were timing out instead of assertion failing because, in MiniMRCluster.shutdown() we do a waitTaskTrackers(). In case of controlled jobs the trackers never get idle until we finish tasks, but then assertion has failed and we would have wait for test to time out.
          Show
          Sreekanth Ramakrishnan added a comment - Attaching new patch correcting few issues: The tests were failing randomly because of a timing issue with regards to speculative tasks being launched. The speculative execution is currently disabled in this patch. The tests were timing out instead of assertion failing because, in MiniMRCluster.shutdown() we do a waitTaskTrackers() . In case of controlled jobs the trackers never get idle until we finish tasks, but then assertion has failed and we would have wait for test to time out.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12409841/HADOOP-5869-1.patch
          against trunk revision 782083.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/474/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/474/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/474/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/474/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409841/HADOOP-5869-1.patch against trunk revision 782083. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/474/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/474/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/474/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/474/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          My original comments were based on the results of a git-bisect from the last successful Hudson revision to HEAD (w/ 10 min timeout), identifying HADOOP-5792 as the first point where TestQueueCapacities started failing; dropping HADOOP-5792 from the tree identified HADOOP-4981 as a point where a failure resurfaced. I confirmed manually within the bisect branch before reopening the prenominate issues, but fresh subversion checkouts of the relevant revisions do not exhibit the behavior I reported previously. I must be mistaken. Thanks for doing the real work on this, Sreekanth.

          After applying the patch to a fresh checkout, the test still times out on my machine...

          Show
          Chris Douglas added a comment - My original comments were based on the results of a git-bisect from the last successful Hudson revision to HEAD (w/ 10 min timeout), identifying HADOOP-5792 as the first point where TestQueueCapacities started failing; dropping HADOOP-5792 from the tree identified HADOOP-4981 as a point where a failure resurfaced. I confirmed manually within the bisect branch before reopening the prenominate issues, but fresh subversion checkouts of the relevant revisions do not exhibit the behavior I reported previously. I must be mistaken. Thanks for doing the real work on this, Sreekanth. After applying the patch to a fresh checkout, the test still times out on my machine...
          Hide
          Sreekanth Ramakrishnan added a comment -

          Ant test passed on my local box.

          Show
          Sreekanth Ramakrishnan added a comment - Ant test passed on my local box.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Same patch applies for branch 20.

          Show
          Sreekanth Ramakrishnan added a comment - Same patch applies for branch 20.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Output from ant test-patch:

               [exec]
               [exec] -1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          

          The patch fixes a test case failure and the test actually tests the bug.

          Show
          Sreekanth Ramakrishnan added a comment - Output from ant test-patch: [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. The patch fixes a test case failure and the test actually tests the bug.
          Hide
          Amareshwari Sriramadasu added a comment -

          +1 for the patch. The issue was due to HADOOP-5850. Thanks Sreekanth for finding out.

          Show
          Amareshwari Sriramadasu added a comment - +1 for the patch. The issue was due to HADOOP-5850 . Thanks Sreekanth for finding out.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching patch to fix this issue. Without this patch the test case does not pass, and with it it passes. Currently in TaskInProgress we check if the task is map task or reduce task and if the task is Setup or cleanup we assign appropriate split class and split bytes.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching patch to fix this issue. Without this patch the test case does not pass, and with it it passes. Currently in TaskInProgress we check if the task is map task or reduce task and if the task is Setup or cleanup we assign appropriate split class and split bytes.
          Hide
          Sreekanth Ramakrishnan added a comment -

          The reason why the test case is currently timing out in the trunk is because, currently frame work returns setup and cleanup tasks always as a map task no matter which slot is free on task tracker. Causing setup and cleanup tasks to block. This causes the ControlledMapReduceJob to wait forever.

          Show
          Sreekanth Ramakrishnan added a comment - The reason why the test case is currently timing out in the trunk is because, currently frame work returns setup and cleanup tasks always as a map task no matter which slot is free on task tracker. Causing setup and cleanup tasks to block. This causes the ControlledMapReduceJob to wait forever.
          Hide
          Giridharan Kesavan added a comment -

          this patch adds commons-cli-1.2 to the capacity scheduler's classpath

          Show
          Giridharan Kesavan added a comment - this patch adds commons-cli-1.2 to the capacity scheduler's classpath
          Hide
          Sreekanth Ramakrishnan added a comment -

          Currently, the TestQueueCapacities and MiniMRCluster tests in Capacity scheduler are broken because of Class not found exception, of Commons cli.

          Show
          Sreekanth Ramakrishnan added a comment - Currently, the TestQueueCapacities and MiniMRCluster tests in Capacity scheduler are broken because of Class not found exception, of Commons cli.
          Hide
          Chris Douglas added a comment -

          Two issues are responsible for this. The first is HADOOP-5792, which caused TestQueueCapacities to fail. Once reverted, HADOOP-4981 evidently introduces a second regression, which was ignored because it was assumed that trunk remained the sole cause of the failure. Note that TestQueueCapacities is also failing in the 0.20 branch.

          Show
          Chris Douglas added a comment - Two issues are responsible for this. The first is HADOOP-5792 , which caused TestQueueCapacities to fail. Once reverted, HADOOP-4981 evidently introduces a second regression, which was ignored because it was assumed that trunk remained the sole cause of the failure. Note that TestQueueCapacities is also failing in the 0.20 branch.
          Hide
          steve_l added a comment -

          I see this too. Here's a thread dump of the junit process while it appears to be waiting for something to finish.

          At the very least the wait operation could do some timeouts so we'd get a better output in the log

          Show
          steve_l added a comment - I see this too. Here's a thread dump of the junit process while it appears to be waiting for something to finish. At the very least the wait operation could do some timeouts so we'd get a better output in the log

            People

            • Assignee:
              Sreekanth Ramakrishnan
              Reporter:
              Sreekanth Ramakrishnan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development