Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4764

Application submission fails when submitted queue is not available in scheduler xml

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Available queues in capacity scheduler
      -root
      --queue1
      --queue2

      Submit application with queue3

      16/03/04 16:40:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457077554812_1901
      16/03/04 16:40:08 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 3938 for mapred with renewer yarn)
      16/03/04 16:40:08 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication over rm2. Not retrying because try once and fail.
      java.lang.NullPointerException: java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:366)
              at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:289)
              at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:618)
              at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:252)
              at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:483)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2301)
      
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
              at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
              at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
              at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
              at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:272)
      
      

      Should be queue doesnt exist

      1. 0001-YARN-4764.patch
        4 kB
        Bibin A Chundatt
      2. 0002-YARN-4764.patch
        7 kB
        Bibin A Chundatt

        Issue Links

          Activity

          Hide
          brahmareddy Brahma Reddy Battula added a comment - - edited

          Just linking the broken jira.

          Show
          brahmareddy Brahma Reddy Battula added a comment - - edited Just linking the broken jira.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9444 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9444/)
          YARN-4764. Application submission fails when submitted queue is not (jianhe: rev 3c33158d1cb38ee4ab3baa21752a3cdf0bdc8ccc)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationACLs.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9444 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9444/ ) YARN-4764 . Application submission fails when submitted queue is not (jianhe: rev 3c33158d1cb38ee4ab3baa21752a3cdf0bdc8ccc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationACLs.java
          Hide
          jianhe Jian He added a comment -

          Committed to trunk, branch-2, thanks Bibin A Chundatt !
          Thanks Sunil G for reviewing the patch !

          Show
          jianhe Jian He added a comment - Committed to trunk, branch-2, thanks Bibin A Chundatt ! Thanks Sunil G for reviewing the patch !
          Hide
          jianhe Jian He added a comment -

          Makes sense to me, thank you for the summarization !

          Show
          jianhe Jian He added a comment - Makes sense to me, thank you for the summarization !
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Sunil G/Jian He
          Thank you for discussing with summarizing. So the currently implemented patch is fine rt?

          Show
          bibinchundatt Bibin A Chundatt added a comment - Sunil G / Jian He Thank you for discussing with summarizing. So the currently implemented patch is fine rt?
          Hide
          sunilg Sunil G added a comment -

          Thanks Bibin A Chundatt for the analysis and suggestions..

          I will try summarizing the discussion so far. Agreeing to the fact that the current behavior is inconsistent, we have to give similar way of handling for queue-non-existent scenario (with and w/o ACL). So we had 2 options.

          1. For CS alone, queue-non-existent check could be done in createAndPopulateRMApp. Being said this, it will be common for apps with or w/o ACL enabled. This will make App to be rejected before even RMApp is created, hence audit logging is needed.
          2. Or we could skip the ACL check if queue is non-existent and can pass to Scheduler inside so that it can send APP_REJECT. This will be inline with the old behavior. A minor drawback will be like, we know queue is not existing, and still we send scheduler for a know failure handling.

          I had an offline talk with Jian He also on this. May be we can go with Option 2 for now. This will make a consistent behavior. But we need to improve here. So I can raise an improvement ticket and all queue related validation check can be done in a new YarnScheduler api. We can see how much we can make it common for Fair and CS too.
          Thoughts?

          Show
          sunilg Sunil G added a comment - Thanks Bibin A Chundatt for the analysis and suggestions.. I will try summarizing the discussion so far. Agreeing to the fact that the current behavior is inconsistent, we have to give similar way of handling for queue-non-existent scenario (with and w/o ACL). So we had 2 options. 1. For CS alone, queue-non-existent check could be done in createAndPopulateRMApp . Being said this, it will be common for apps with or w/o ACL enabled. This will make App to be rejected before even RMApp is created, hence audit logging is needed. 2. Or we could skip the ACL check if queue is non-existent and can pass to Scheduler inside so that it can send APP_REJECT . This will be inline with the old behavior. A minor drawback will be like, we know queue is not existing, and still we send scheduler for a know failure handling. I had an offline talk with Jian He also on this. May be we can go with Option 2 for now. This will make a consistent behavior. But we need to improve here. So I can raise an improvement ticket and all queue related validation check can be done in a new YarnScheduler api. We can see how much we can make it common for Fair and CS too. Thoughts?
          Hide
          bibinchundatt Bibin A Chundatt added a comment -
          1. Also only for capacity scheduler we will be doing validation in createAndPopulateRMApp .
            i think APP_REJECT to be a better approach.
          Show
          bibinchundatt Bibin A Chundatt added a comment - Also only for capacity scheduler we will be doing validation in createAndPopulateRMApp . i think APP_REJECT to be a better approach.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Jian He

          1. The behaviour will be not be same as previous versions.
          2. If we are planning to add validation in createAndPopulateRMApp then cases like invalid queue, submit to parent queue etc need to be added in createAndPopulateRMApp same set of checks we are adding in too many places.

          i think we should keep the behaviour as old and only the acl should be handled in createAndPopulateRMApp.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Jian He The behaviour will be not be same as previous versions. If we are planning to add validation in createAndPopulateRMApp then cases like invalid queue, submit to parent queue etc need to be added in createAndPopulateRMApp same set of checks we are adding in too many places. i think we should keep the behaviour as old and only the acl should be handled in createAndPopulateRMApp .
          Hide
          jianhe Jian He added a comment -

          Is it better if we throw exception for CS in RMAppManager#createAndPopulateRMApp itself?

          I think it is fine to fail the createAndPopulateRMApp call itself.

          Show
          jianhe Jian He added a comment - Is it better if we throw exception for CS in RMAppManager#createAndPopulateRMApp itself? I think it is fine to fail the createAndPopulateRMApp call itself.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 12m 38s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 40s trunk passed
          +1 compile 0m 25s trunk passed with JDK v1.8.0_74
          +1 compile 0m 28s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 19s trunk passed
          +1 mvnsite 0m 33s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 6s trunk passed
          +1 javadoc 0m 21s trunk passed with JDK v1.8.0_74
          +1 javadoc 0m 25s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 27s the patch passed with JDK v1.8.0_74
          +1 javac 0m 27s the patch passed
          +1 compile 0m 26s the patch passed with JDK v1.7.0_95
          +1 javac 0m 26s the patch passed
          +1 checkstyle 0m 15s the patch passed
          +1 mvnsite 0m 31s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 14s the patch passed
          +1 javadoc 0m 19s the patch passed with JDK v1.8.0_74
          +1 javadoc 0m 23s the patch passed with JDK v1.7.0_95
          -1 unit 68m 18s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74.
          -1 unit 68m 0s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 18s Patch does not generate ASF License warnings.
          165m 0s



          Reason Tests
          JDK v1.8.0_74 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12791775/0002-YARN-4764.patch
          JIRA Issue YARN-4764
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux cf2fa186d8c0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / fd1c09b
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-YARN-Build/10717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10717/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10717/console
          Powered by Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 12m 38s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 40s trunk passed +1 compile 0m 25s trunk passed with JDK v1.8.0_74 +1 compile 0m 28s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 33s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 6s trunk passed +1 javadoc 0m 21s trunk passed with JDK v1.8.0_74 +1 javadoc 0m 25s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 30s the patch passed +1 compile 0m 27s the patch passed with JDK v1.8.0_74 +1 javac 0m 27s the patch passed +1 compile 0m 26s the patch passed with JDK v1.7.0_95 +1 javac 0m 26s the patch passed +1 checkstyle 0m 15s the patch passed +1 mvnsite 0m 31s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 14s the patch passed +1 javadoc 0m 19s the patch passed with JDK v1.8.0_74 +1 javadoc 0m 23s the patch passed with JDK v1.7.0_95 -1 unit 68m 18s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. -1 unit 68m 0s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 165m 0s Reason Tests JDK v1.8.0_74 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12791775/0002-YARN-4764.patch JIRA Issue YARN-4764 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux cf2fa186d8c0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fd1c09b Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/10717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-YARN-Build/10717/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10717/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10717/console Powered by Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Bibin A Chundatt and Jian He
          Is it better if we throw exception for CS in RMAppManager#createAndPopulateRMApp itself? Or any other advantages other than considering it as a Failed Application. May be an audit log can help in tracking such failures?

          Show
          sunilg Sunil G added a comment - Bibin A Chundatt and Jian He Is it better if we throw exception for CS in RMAppManager#createAndPopulateRMApp itself? Or any other advantages other than considering it as a Failed Application. May be an audit log can help in tracking such failures?
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Attaching patch with testcase.
          Skipping acl check when csqueue==null so that the behaviour will be same as old.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Attaching patch with testcase. Skipping acl check when csqueue==null so that the behaviour will be same as old.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Jian He

          In current approach we are rejecting before RMApp is created if Acl is enabled and if acl is not enabled will create RMApp.
          Should we have to keep it consistent with acl enabled and disabled. and make the app rejected after RMApp is created?? Similar to old behaviour.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Jian He In current approach we are rejecting before RMApp is created if Acl is enabled and if acl is not enabled will create RMApp. Should we have to keep it consistent with acl enabled and disabled. and make the app rejected after RMApp is created?? Similar to old behaviour.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Sunil G
          Thank you for confirming.
          Jian He
          Could you also please confirm.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Sunil G Thank you for confirming. Jian He Could you also please confirm.
          Hide
          sunilg Sunil G added a comment -

          Inline with above comments, I think the current approach in the patch looks fine.

          Show
          sunilg Sunil G added a comment - Inline with above comments, I think the current approach in the patch looks fine.
          Hide
          sunilg Sunil G added a comment -

          Thanks Jian He for the clarification. Yes, APP_REJECTED exception will be thrown from CapacityScheduler#addApplication in this case.

          If ACLs are enabled as per current code, then for "queue-not-exist" scenario we will not be creating an RMApp and directly AccessControlException will be thrown from createAndPopulateRMApp. But with ACLs disabled, RMApp will be created and then APP_REJECTED will be thrown back from scheduler (reservation/normal case).
          So there is a behavioral change exists as per now and this is added in YARN-4522.

          Quoting the "queue-not-exist" error in non-acl scenario

          java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1457161239294_0002 to YARN : Application application_1457161239294_0002 submitted by user root to unknown queue: a1
          

          I think we can make the behavior same in both cases. Also by seeing below code snippet in createAndPopulateRMApp,

              UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(user);
              // Since FairScheduler queue mapping is done inside scheduler,
              // if FairScheduler is used and the queue doesn't exist, we should not
              // fail here because queue will be created inside FS. Ideally, FS queue
              // mapping should be done outside scheduler too like CS.
              // For now, exclude FS for the acl check.
              if (!isRecovery && YarnConfiguration.isAclEnabled(conf)
          

          its better we do NOT do a "non-existent-queue" check here. So I think we can have the current check done in the patch inside below if condition.

               if (!isRecovery && YarnConfiguration.isAclEnabled(conf)
                  && scheduler instanceof CapacityScheduler) {
          

          And if queue doesnt exist, then we can create a clear message like "Application application_1457161239294_0002 submitted by user root to unknown queue: a1" and throw YarnException directly. Thoughts?

          Show
          sunilg Sunil G added a comment - Thanks Jian He for the clarification. Yes, APP_REJECTED exception will be thrown from CapacityScheduler#addApplication in this case. If ACLs are enabled as per current code, then for "queue-not-exist" scenario we will not be creating an RMApp and directly AccessControlException will be thrown from createAndPopulateRMApp . But with ACLs disabled, RMApp will be created and then APP_REJECTED will be thrown back from scheduler (reservation/normal case). So there is a behavioral change exists as per now and this is added in YARN-4522 . Quoting the "queue-not-exist" error in non-acl scenario java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1457161239294_0002 to YARN : Application application_1457161239294_0002 submitted by user root to unknown queue: a1 I think we can make the behavior same in both cases. Also by seeing below code snippet in createAndPopulateRMApp , UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(user); // Since FairScheduler queue mapping is done inside scheduler, // if FairScheduler is used and the queue doesn't exist, we should not // fail here because queue will be created inside FS. Ideally, FS queue // mapping should be done outside scheduler too like CS. // For now, exclude FS for the acl check. if (!isRecovery && YarnConfiguration.isAclEnabled(conf) its better we do NOT do a "non-existent-queue" check here. So I think we can have the current check done in the patch inside below if condition. if (!isRecovery && YarnConfiguration.isAclEnabled(conf) && scheduler instanceof CapacityScheduler) { And if queue doesnt exist, then we can create a clear message like "Application application_1457161239294_0002 submitted by user root to unknown queue: a1" and throw YarnException directly. Thoughts?
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          IIUC its related to changes in YARN-4571

          Show
          bibinchundatt Bibin A Chundatt added a comment - IIUC its related to changes in YARN-4571
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Do we know which patch broke this? We used to have a clear message for apps submitted to non-existing queues.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Do we know which patch broke this? We used to have a clear message for apps submitted to non-existing queues.
          Hide
          jianhe Jian He added a comment -

          IIUC, that logic is , if the queue does not exist, resolveReservationQueueName will still return the queueName, and later on in addApplication, the job will be marked as FAILED.

          Show
          jianhe Jian He added a comment - IIUC, that logic is , if the queue does not exist, resolveReservationQueueName will still return the queueName, and later on in addApplication, the job will be marked as FAILED.
          Hide
          sunilg Sunil G added a comment -

          CapacityScheduler#resolveReservationQueueName handles reservation cases. reservationID is considered there. So there can be cases where queue may not be there in CS, but reservationID may be valid and submission can go to that reservation queue. cc/Jian He, could you pls help to share your thoughts in this.

          Show
          sunilg Sunil G added a comment - CapacityScheduler#resolveReservationQueueName handles reservation cases. reservationID is considered there. So there can be cases where queue may not be there in CS, but reservationID may be valid and submission can go to that reservation queue. cc/ Jian He , could you pls help to share your thoughts in this.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          1. In case queue-mappings, we can submit w/o queue. So I think we might break some feature here now? Thoughts?

          scheduler mapping is configured CSQueue shouldn't be null. Other scenario will wait for inputs..

          2. Do we need to send this as RPC remote exception from here?

          Should be YarnException as earlier also used to be YarnException

          Show
          bibinchundatt Bibin A Chundatt added a comment - 1. In case queue-mappings, we can submit w/o queue. So I think we might break some feature here now? Thoughts? scheduler mapping is configured CSQueue shouldn't be null. Other scenario will wait for inputs.. 2. Do we need to send this as RPC remote exception from here? Should be YarnException as earlier also used to be YarnException
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 0s trunk passed
          +1 compile 0m 26s trunk passed with JDK v1.8.0_74
          +1 compile 0m 28s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 18s trunk passed
          +1 mvnsite 0m 36s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 8s trunk passed
          +1 javadoc 0m 21s trunk passed with JDK v1.8.0_74
          +1 javadoc 0m 26s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 25s the patch passed with JDK v1.8.0_74
          +1 javac 0m 25s the patch passed
          +1 compile 0m 26s the patch passed with JDK v1.7.0_95
          +1 javac 0m 26s the patch passed
          +1 checkstyle 0m 16s the patch passed
          +1 mvnsite 0m 32s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 15s the patch passed
          +1 javadoc 0m 19s the patch passed with JDK v1.8.0_74
          +1 javadoc 0m 23s the patch passed with JDK v1.7.0_95
          -1 unit 71m 22s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74.
          -1 unit 71m 55s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 18s Patch does not generate ASF License warnings.
          160m 3s



          Reason Tests
          JDK v1.8.0_74 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12791470/0001-YARN-4764.patch
          JIRA Issue YARN-4764
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 7a2fe0193530 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / cbd3132
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10709/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10709/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10709/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-YARN-Build/10709/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10709/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10709/console
          Powered by Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 0s trunk passed +1 compile 0m 26s trunk passed with JDK v1.8.0_74 +1 compile 0m 28s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 36s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 8s trunk passed +1 javadoc 0m 21s trunk passed with JDK v1.8.0_74 +1 javadoc 0m 26s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 30s the patch passed +1 compile 0m 25s the patch passed with JDK v1.8.0_74 +1 javac 0m 25s the patch passed +1 compile 0m 26s the patch passed with JDK v1.7.0_95 +1 javac 0m 26s the patch passed +1 checkstyle 0m 16s the patch passed +1 mvnsite 0m 32s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 15s the patch passed +1 javadoc 0m 19s the patch passed with JDK v1.8.0_74 +1 javadoc 0m 23s the patch passed with JDK v1.7.0_95 -1 unit 71m 22s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_74. -1 unit 71m 55s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 160m 3s Reason Tests JDK v1.8.0_74 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_95 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12791470/0001-YARN-4764.patch JIRA Issue YARN-4764 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7a2fe0193530 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cbd3132 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/10709/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10709/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10709/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-YARN-Build/10709/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10709/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10709/console Powered by Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          cc/ Jian He

          Show
          sunilg Sunil G added a comment - cc/ Jian He
          Hide
          sunilg Sunil G added a comment -

          Yes Bibin A Chundatt. Thanks for the analysis.

              if (!isRecovery && YarnConfiguration.isAclEnabled(conf)
                  && scheduler instanceof CapacityScheduler &&
                  !authorizer.checkPermission(new AccessRequest(
                      ((CapacityScheduler) scheduler)
                          .getQueue(submissionContext.getQueue()).getPrivilegedEntity(),
                      userUgi, SchedulerUtils.toAccessType(QueueACL.SUBMIT_APPLICATIONS),
                      submissionContext.getApplicationId().toString(),
                      submissionContext.getApplicationName()))
          

          When ACLs are enabled in cluster, as you mentioned an NPE will hit in above code since queue is not present. This exception is now thrown out. I think this handling is not very correct for handling non-existent queue in ACL scenario.

          Meantime in this patch, you are trying to handle this case explicitly and responding with Exception. There are 2 cases:
          1. In case queue-mappings, we can submit w/o queue. So I think we might break some feature here now? Thoughts?
          2. Do we need to send this as RPC remote exception from here?

          Pls correct me if I am wrong.

          Show
          sunilg Sunil G added a comment - Yes Bibin A Chundatt . Thanks for the analysis. if (!isRecovery && YarnConfiguration.isAclEnabled(conf) && scheduler instanceof CapacityScheduler && !authorizer.checkPermission( new AccessRequest( ((CapacityScheduler) scheduler) .getQueue(submissionContext.getQueue()).getPrivilegedEntity(), userUgi, SchedulerUtils.toAccessType(QueueACL.SUBMIT_APPLICATIONS), submissionContext.getApplicationId().toString(), submissionContext.getApplicationName())) When ACLs are enabled in cluster, as you mentioned an NPE will hit in above code since queue is not present. This exception is now thrown out. I think this handling is not very correct for handling non-existent queue in ACL scenario. Meantime in this patch, you are trying to handle this case explicitly and responding with Exception. There are 2 cases: 1. In case queue-mappings, we can submit w/o queue. So I think we might break some feature here now? Thoughts? 2. Do we need to send this as RPC remote exception from here? Pls correct me if I am wrong.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Sunil G
          For existing acl part of code only formatting is done.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Sunil G For existing acl part of code only formatting is done.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Sunil G

          The problem over here is below

          Application submission fails with NPE when Queue is not available in capacity-scheduler xml

          Analysis

          For checking permission of user for queue we are getting CSQueue object for queue from submission context
          As explained in issue the CSQueue for queue3 will be null since its not configured in capacity-scheduler.xml and (CapacityScheduler) scheduler) .getQueue(submissionContext.getQueue()).getPrivilegedEntity() will cause NPE.

          CSQueue csqueue = ((CapacityScheduler) scheduler).getQueue(queueName) so null is done over here too.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Sunil G The problem over here is below Application submission fails with NPE when Queue is not available in capacity-scheduler xml Analysis For checking permission of user for queue we are getting CSQueue object for queue from submission context As explained in issue the CSQueue for queue3 will be null since its not configured in capacity-scheduler.xml and (CapacityScheduler) scheduler) .getQueue(submissionContext.getQueue()).getPrivilegedEntity() will cause NPE. CSQueue csqueue = ((CapacityScheduler) scheduler).getQueue(queueName) so null is done over here too.
          Hide
          sunilg Sunil G added a comment -

          Hi Bibin A Chundatt
          I am trying to under stand the patch. Pls correct me if I am wrong.

          Here user submitted an app to a non-existent queue. And this patch is trying to look for acl check for the user in a queue. So an AccessControlException will be thrown in this case. But for non-existent queue, do we need this exception.? I think it can be YarnException.

          Show
          sunilg Sunil G added a comment - Hi Bibin A Chundatt I am trying to under stand the patch. Pls correct me if I am wrong. Here user submitted an app to a non-existent queue. And this patch is trying to look for acl check for the user in a queue. So an AccessControlException will be thrown in this case. But for non-existent queue, do we need this exception.? I think it can be YarnException.
          Hide
          sunilg Sunil G added a comment -

          Sorry. My bad. You are mentioning about suppressing exception to the user console end and now it's too verbose. Got it.

          Show
          sunilg Sunil G added a comment - Sorry. My bad. You are mentioning about suppressing exception to the user console end and now it's too verbose. Got it.
          Hide
          sunilg Sunil G added a comment -

          Is this in trunk. Pls share more information with config details if possible.

          Show
          sunilg Sunil G added a comment - Is this in trunk. Pls share more information with config details if possible.

            People

            • Assignee:
              bibinchundatt Bibin A Chundatt
              Reporter:
              bibinchundatt Bibin A Chundatt
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development