Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5060

Fetch failures that time out only count against the first map task

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.7, 2.1.0-beta
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      When a fetch failure happens, if the socket has already "connected" it is only counted against the first map task. But most of the time it is because of an issue with the Node itself, not the individual map task, and as such all failures when trying to initiate the connection should count against all of the tasks.

      This caused a particularly unfortunate job to take an hour an a half longer then it needed to.

      1. MR-5060.txt
        5 kB
        Robert Joseph Evans
      2. MR-5060.txt
        2 kB
        Robert Joseph Evans
      3. MR-5060-trunk.txt
        5 kB
        Robert Joseph Evans

        Activity

        Hide
        Robert Joseph Evans added a comment -

        This is a preliminary patch that removes the extra checks for connection established while setting up the connection to the NM. Only when the reducer actually starts reading map data does it associate a failure with a particular task.

        Show
        Robert Joseph Evans added a comment - This is a preliminary patch that removes the extra checks for connection established while setting up the connection to the NM. Only when the reducer actually starts reading map data does it associate a failure with a particular task.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12573398/MR-5060.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3405//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3405//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573398/MR-5060.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3405//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3405//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        I ran all of the MR tests and they all still pass. I will look at adding in a test for this.

        Show
        Robert Joseph Evans added a comment - I ran all of the MR tests and they all still pass. I will look at adding in a test for this.
        Hide
        Robert Joseph Evans added a comment -

        Includes a test for the timeout. Should be good to go.

        Show
        Robert Joseph Evans added a comment - Includes a test for the timeout. Should be good to go.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12573410/MR-5060.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 tests included appear to have a timeout.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3407//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573410/MR-5060.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 tests included appear to have a timeout. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3407//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Sorry the patch was against 0.23, and trunk had some refactoring. This patch applies to trunk, the previous patch is needed for branch-0.23 and any branch that does not have MAPREDUCE-4808

        Show
        Robert Joseph Evans added a comment - Sorry the patch was against 0.23, and trunk had some refactoring. This patch applies to trunk, the previous patch is needed for branch-0.23 and any branch that does not have MAPREDUCE-4808
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12573417/MR-5060-trunk.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 tests included appear to have a timeout.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3409//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3409//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573417/MR-5060-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 tests included appear to have a timeout. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3409//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3409//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        +1 lgtm

        Show
        Jason Lowe added a comment - +1 lgtm
        Hide
        Jason Lowe added a comment -

        Thanks, Bobby. I committed this to trunk, branch-2, and branch-0.23.

        Show
        Jason Lowe added a comment - Thanks, Bobby. I committed this to trunk, branch-2, and branch-0.23.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3455 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3455/)
        MAPREDUCE-5060. Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455740)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455740
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3455 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3455/ ) MAPREDUCE-5060 . Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455740) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455740 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #154 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/154/)
        MAPREDUCE-5060. Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455740)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455740
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #154 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/154/ ) MAPREDUCE-5060 . Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455740) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455740 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #552 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/552/)
        MAPREDUCE-5060. Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455743)

        Result = UNSTABLE
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455743
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #552 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/552/ ) MAPREDUCE-5060 . Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455743) Result = UNSTABLE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455743 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1343 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1343/)
        MAPREDUCE-5060. Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455740)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455740
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1343 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1343/ ) MAPREDUCE-5060 . Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455740) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455740 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1371 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1371/)
        MAPREDUCE-5060. Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455740)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455740
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1371 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1371/ ) MAPREDUCE-5060 . Fetch failures that time out only count against the first map task. Contributed by Robert Joseph Evans (Revision 1455740) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455740 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java

          People

          • Assignee:
            Robert Joseph Evans
            Reporter:
            Robert Joseph Evans
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development