Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2947

Sort fails on YARN+MR with lots of task failures

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: mrv2
    • Labels:
      None

      Description

      Karam Singh(the great man the world hardly knows about) found lots of failing tasks while running sort on a 350 node cluster. The failed tasks eventually failed the job and this happening consistently on the big cluster.

      Container launch failed for container_1315410418107_0002_01_002511 : RemoteTrace: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:129) at java.nio.ByteBuffer.get(ByteBuffer.java:675) at com.google.protobuf.ByteString.copyFrom(ByteString.java:108) at com.google.protobuf.ByteString.copyFrom(ByteString.java:117) at org.apache.hadoop.yarn.util.ProtoUtils.convertToProtoFormat(ProtoUtils.java:97) at org.apache.hadoop.yarn.api.records.ProtoBase.convertToProtoFormat(ProtoBase.java:59) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerResponsePBImpl.access$100(StartContainerResponsePBImpl.java:35) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerResponsePBImpl$1$1.next(StartContainerResponsePBImpl.java:134) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerResponsePBImpl$1$1.next(StartContainerResponsePBImpl.java:122) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:319) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerResponseProto$Builder.addAllServiceResponse(YarnServiceProtos.java:12620) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerResponsePBImpl.addServiceResponseToProto(StartContainerResponsePBImpl.java:144) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerResponsePBImpl.mergeLocalToBuilder(StartContainerResponsePBImpl.java:60) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerResponsePBImpl.mergeLocalToProto(StartContainerResponsePBImpl.java:68) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerResponsePBImpl.getProto(StartContainerResponsePBImpl.java:52) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:69) at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:337) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490) at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:151) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:215) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)

      1. MAPREDUCE-2947-20110907.txt
        6 kB
        Vinod Kumar Vavilapalli

        Issue Links

          Activity

          Hide
          Vinod Kumar Vavilapalli added a comment -

          I cornered this down, it should have started happening after MAPREDUCE-2652 went in.

          Show
          Vinod Kumar Vavilapalli added a comment - I cornered this down, it should have started happening after MAPREDUCE-2652 went in.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Cloning the metadata from the AuxiliaryServices to prevent this.

          Also synchronizing access to all methods in StartContainerResponsePBImpl just to be safe.

          Show
          Vinod Kumar Vavilapalli added a comment - Cloning the metadata from the AuxiliaryServices to prevent this. Also synchronizing access to all methods in StartContainerResponsePBImpl just to be safe.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Will take Karam Singh's help to test this tomorrow. Eager committers, hold off!

          Show
          Vinod Kumar Vavilapalli added a comment - Will take Karam Singh 's help to test this tomorrow. Eager committers, hold off!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12493351/MAPREDUCE-2947-20110907.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 14 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-hs.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-shuffle.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-jobclient.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12493351/MAPREDUCE-2947-20110907.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 14 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-hs.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-shuffle.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-jobclient.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/642//console This message is automatically generated.
          Hide
          Karam Singh added a comment -

          I am not seeing the error after applying patch

          Show
          Karam Singh added a comment - I am not seeing the error after applying patch
          Hide
          Arun C Murthy added a comment -

          I just committed this. Thanks Vinod!

          Show
          Arun C Murthy added a comment - I just committed this. Thanks Vinod!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #850 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/850/)
          MAPREDUCE-2947. Fixed race condition in AuxiliaryServices. Contributed by Vinod K V.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #850 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/850/ ) MAPREDUCE-2947 . Fixed race condition in AuxiliaryServices. Contributed by Vinod K V. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #861 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/861/)
          MAPREDUCE-2947. Fixed race condition in AuxiliaryServices. Contributed by Vinod K V.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #861 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/861/ ) MAPREDUCE-2947 . Fixed race condition in AuxiliaryServices. Contributed by Vinod K V. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #927 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/927/)
          MAPREDUCE-2947. Fixed race condition in AuxiliaryServices. Contributed by Vinod K V.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #927 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/927/ ) MAPREDUCE-2947 . Fixed race condition in AuxiliaryServices. Contributed by Vinod K V. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #811 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/811/)
          MAPREDUCE-2947. Fixed race condition in AuxiliaryServices. Contributed by Vinod K V.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #811 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/811/ ) MAPREDUCE-2947 . Fixed race condition in AuxiliaryServices. Contributed by Vinod K V. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #788 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/788/)
          MAPREDUCE-2947. Fixed race condition in AuxiliaryServices. Contributed by Vinod K V.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #788 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/788/ ) MAPREDUCE-2947 . Fixed race condition in AuxiliaryServices. Contributed by Vinod K V. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1166849 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerResponsePBImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java

            People

            • Assignee:
              Vinod Kumar Vavilapalli
              Reporter:
              Vinod Kumar Vavilapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development