Hadoop YARN
  1. Hadoop YARN
  2. YARN-180

Capacity scheduler - containers that get reserved create container token to early

    Details

      Description

      The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default.

      This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired.

      1. YARN-180-branch_0.23.patch
        3 kB
        Thomas Graves
      2. YARN-180.patch
        3 kB
        Robert Joseph Evans
      3. YARN-180.patch
        3 kB
        Arun C Murthy
      4. YARN-180.patch
        3 kB
        Arun C Murthy

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1236 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1236/)
          YARN-180. Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401703)

          Result = FAILURE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401703
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1236 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1236/ ) YARN-180 . Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401703) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401703 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1206 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1206/)
          YARN-180. Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401703)

          Result = SUCCESS
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401703
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1206 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1206/ ) YARN-180 . Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401703) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401703 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #415 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/415/)
          YARN-180. Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401706)

          Result = SUCCESS
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401706
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #415 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/415/ ) YARN-180 . Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401706) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401706 Files : /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #16 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/16/)
          YARN-180. Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401703)

          Result = SUCCESS
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401703
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #16 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/16/ ) YARN-180 . Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401703) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401703 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Hide
          Robert Joseph Evans added a comment -

          Thanks Arun for the patch, and thanks Tom for porting it to 0.23.

          I put this into trunk, branch-2, and branch-0.23

          Show
          Robert Joseph Evans added a comment - Thanks Arun for the patch, and thanks Tom for porting it to 0.23. I put this into trunk, branch-2, and branch-0.23
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #2922 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2922/)
          YARN-180. Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401703)

          Result = SUCCESS
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401703
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #2922 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2922/ ) YARN-180 . Capacity scheduler - containers that get reserved create container token to early (acmurthy and bobby) (Revision 1401703) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401703 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          Hide
          Robert Joseph Evans added a comment -

          Thanks for the review Tom, I'll check it in now. Also the port to 0.23 looks clean, a simple refactoring, so +1 for that too.

          Show
          Robert Joseph Evans added a comment - Thanks for the review Tom, I'll check it in now. Also the port to 0.23 looks clean, a simple refactoring, so +1 for that too.
          Hide
          Thomas Graves added a comment -

          +1 for latest patch. I manually tested this on a small cluster and verified that a container can be reserved for > 10 minutes and the AM can still start the container after finally being allocated it.

          Show
          Thomas Graves added a comment - +1 for latest patch. I manually tested this on a small cluster and verified that a container can be reserved for > 10 minutes and the AM can still start the container after finally being allocated it.
          Hide
          Thomas Graves added a comment -

          here is the same patch but for branch-0.23 since it didn't apply cleanly.

          Show
          Thomas Graves added a comment - here is the same patch but for branch-0.23 since it didn't apply cleanly.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550518/YARN-180.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/121//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/121//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550518/YARN-180.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/121//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/121//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          Oh I noticed that the containerToken is never assigned anyways. I will fix that too.

          Show
          Robert Joseph Evans added a comment - Oh I noticed that the containerToken is never assigned anyways. I will fix that too.
          Hide
          Robert Joseph Evans added a comment -

          The patch looks mostly good. I am a bit confused by

          if (containerToken == null) {
            containerToken = null; // Try again later.
          }
          

          inside the new createContainerToken method. It is a copy and paste from before, but not needed any more.

          Other then that it looks good. Since Arun is on a plane now I will upload a new patch.

          Show
          Robert Joseph Evans added a comment - The patch looks mostly good. I am a bit confused by if (containerToken == null ) { containerToken = null ; // Try again later. } inside the new createContainerToken method. It is a copy and paste from before, but not needed any more. Other then that it looks good. Since Arun is on a plane now I will upload a new patch.
          Hide
          Arun C Murthy added a comment -

          Unfortunately, there isn't a good way to unit-test this - since this change is straight-fwd a careful review should suffice.

          Show
          Arun C Murthy added a comment - Unfortunately, there isn't a good way to unit-test this - since this change is straight-fwd a careful review should suffice.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550355/YARN-180.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/113//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/113//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550355/YARN-180.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/113//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/113//console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          It looks like even after container assignment, there are some code paths ( atleast one right below the changes in the patch) where we don't return the assigned container. One simpler approach to token generation is when RMContainerImpl moves to ACQUIRED state. Thoughts?

          Show
          Vinod Kumar Vavilapalli added a comment - It looks like even after container assignment, there are some code paths ( atleast one right below the changes in the patch) where we don't return the assigned container. One simpler approach to token generation is when RMContainerImpl moves to ACQUIRED state. Thoughts?
          Hide
          Arun C Murthy added a comment -

          Good catch Tom! Here is a straight-fwd patch... still needs some testing.

          Show
          Arun C Murthy added a comment - Good catch Tom! Here is a straight-fwd patch... still needs some testing.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Good catch, Thomas!

          Show
          Vinod Kumar Vavilapalli added a comment - Good catch, Thomas!
          Hide
          Thomas Graves added a comment -

          note that the container token expired causes the AM to fail the launch of the container with error like:

          2012-10-20 10:27:15,702 ERROR ContainerLauncher #70
          org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container
          launch failed for container_1350066773975_81309_01_011780 : RemoteTrace:
          at LocalTrace:
          org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
          Unauthorized request to start container.
          This token is expired. current time is 1350728835262 found 1350717961434
          at
          org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:156)
          at $Proxy30.startContainer(Unknown Source)
          at
          org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:104)
          at
          org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
          at
          org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:390)
          at
          java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at
          java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:619)

          Show
          Thomas Graves added a comment - note that the container token expired causes the AM to fail the launch of the container with error like: 2012-10-20 10:27:15,702 ERROR ContainerLauncher #70 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1350066773975_81309_01_011780 : RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unauthorized request to start container. This token is expired. current time is 1350728835262 found 1350717961434 at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:156) at $Proxy30.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:104) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:390) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)

            People

            • Assignee:
              Arun C Murthy
              Reporter:
              Thomas Graves
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development