Details

    • Type: Sub-task
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: client, resourcemanager
    • Labels:
      None

      Description

      This JIRA tracks the design/implementation of the layer for routing ApplicaitonClientProtocol requests to the appropriate
      RM(s) in a federated YARN cluster.

      1. YARN-3659.pdf
        156 kB
        Giovanni Matteo Fumarola
      2. YARN-3659-YARN-2915.1.patch
        84 kB
        Giovanni Matteo Fumarola
      3. YARN-3659-YARN-2915.2.patch
        109 kB
        Giovanni Matteo Fumarola
      4. YARN-3659-YARN-2915.draft.patch
        58 kB
        Giovanni Matteo Fumarola

        Issue Links

          Activity

          Hide
          luhuichun luhuichun added a comment - - edited

          Giovanni Matteo Fumarola Hi Giovanni, I'd like to contribute and work on this, if you haven't started working on it yet. Thank you.

          Show
          luhuichun luhuichun added a comment - - edited Giovanni Matteo Fumarola Hi Giovanni, I'd like to contribute and work on this, if you haven't started working on it yet. Thank you.
          Hide
          subru Subru Krishnan added a comment -

          Thanks luhuichun for offering to help! We have a prototype code which you can see the in the uber patch here , which Giovanni Matteo Fumarola has been polishing/testing. He's close to posting the patch, so will be good if you help out with the review etc.

          Show
          subru Subru Krishnan added a comment - Thanks luhuichun for offering to help! We have a prototype code which you can see the in the uber patch here , which Giovanni Matteo Fumarola has been polishing/testing. He's close to posting the patch, so will be good if you help out with the review etc.
          Hide
          giovanni.fumarola Giovanni Matteo Fumarola added a comment -

          Attached (draft). There are multiple edited files under the package org.apache.hadoop.yarn.server.federation.policies.router due to introduction of a blacklist concept. Should we create a new jira?

          Show
          giovanni.fumarola Giovanni Matteo Fumarola added a comment - Attached (draft). There are multiple edited files under the package org.apache.hadoop.yarn.server.federation.policies.router due to introduction of a blacklist concept. Should we create a new jira?
          Hide
          subru Subru Krishnan added a comment - - edited

          Thanks Giovanni Matteo Fumarola for working on this. I looked at your patch, please find my comments below:

          • A general note on Javadocs - missing code/link annotations throughout.
          • Can you add some documentation for ROUTER_CLIENTRM_SUBMIT_RETRY. You will also need to exclude it in TestYarnConfigurationFields.
          • Is it possible to move the check of subcluster list being empty to AbstractRouterPolicy to prevent duplication?
          • A note on isRunning field in MockResourceManagerFacade will be useful.
          • Rename NO_SUBCLUSTER_MESSAGE to NO_ACTIVE_SUBCLUSTER_AVAILABLE and update the value accordingly.
          • Why is the visibility of RouterUtil public? I feel you can also rename it to RouterServerUtil.
          • Resolution of UGI can be moved to AbstractClientRequestInterceptor as again all child classes are duplicating the same code.
          • Would it be better to use LRUCacheHashMap for clientRMProxies?
          • Rename getApplicationClientProtocolFromSubClusterId --> getClientRMProxyForSubCluster.
          • Distinguish the log message for different exceptions for clarity.
          • Use the UniformRandomRouterPolicy to determine a random subcluster as that's the purpose of the policy.
          •  Client: behavior as YARN. should be Client: identical behavior as {@code ClientRMService}. 
          • Documentation for exception handling in submitApplication is unclear, please rephrase it.
          • Rename aclTemp to clientRMProxy.
          • "Unable to create a new application to SubCluster " should be " No response when attempting to submit the application " + applicationId + "to SubCluster " + subClusterId.getId() 

            .

          • The log statement for app submission can be moved to post successful submission as it's redundant now.
          • We should forward any exceptions we get on forceKillApplication and getApplicationReport to client.
          • Have a note upfront saying we have only implemented the core functionalities and rest will be done in a follow up JIRA (create and link).
          • Use FederationStateStoreTestUtil for helper methods like registerSubCluster/deRegisterSubCluster/registerSubClusters.
          • In TestFederationClientInterceptorRetry, assert that getApplicationHomeSubCluster is the good subCluster.
          Show
          subru Subru Krishnan added a comment - - edited Thanks Giovanni Matteo Fumarola for working on this. I looked at your patch, please find my comments below: A general note on Javadocs - missing code/link annotations throughout. Can you add some documentation for ROUTER_CLIENTRM_SUBMIT_RETRY. You will also need to exclude it in TestYarnConfigurationFields . Is it possible to move the check of subcluster list being empty to AbstractRouterPolicy to prevent duplication? A note on isRunning field in MockResourceManagerFacade will be useful. Rename NO_SUBCLUSTER_MESSAGE to NO_ACTIVE_SUBCLUSTER_AVAILABLE and update the value accordingly. Why is the visibility of RouterUtil public? I feel you can also rename it to RouterServerUtil. Resolution of UGI can be moved to AbstractClientRequestInterceptor as again all child classes are duplicating the same code. Would it be better to use LRUCacheHashMap for clientRMProxies ? Rename getApplicationClientProtocolFromSubClusterId --> getClientRMProxyForSubCluster . Distinguish the log message for different exceptions for clarity. Use the UniformRandomRouterPolicy to determine a random subcluster as that's the purpose of the policy. Client: behavior as YARN. should be Client: identical behavior as {@code ClientRMService}. Documentation for exception handling in submitApplication is unclear, please rephrase it. Rename aclTemp to clientRMProxy . "Unable to create a new application to SubCluster " should be " No response when attempting to submit the application " + applicationId + "to SubCluster " + subClusterId.getId() . The log statement for app submission can be moved to post successful submission as it's redundant now. We should forward any exceptions we get on forceKillApplication and getApplicationReport to client. Have a note upfront saying we have only implemented the core functionalities and rest will be done in a follow up JIRA (create and link). Use FederationStateStoreTestUtil for helper methods like registerSubCluster/deRegisterSubCluster/registerSubClusters . In TestFederationClientInterceptorRetry , assert that getApplicationHomeSubCluster is the good subCluster.
          Hide
          giovanni.fumarola Giovanni Matteo Fumarola added a comment - - edited

          Thanks Subru Krishnan for the feedbacks. Attached V2 that fixes them. Answers in order.

          • Done.
          • Done.
          • Right now AbstractRouterPolicy has no the list of the active. The validation is done in the FederationPolicyUtils. Add a respective test.
          • Done.
          • Done.
          • Done.
          • Done.
          • I use the ConcurrentHashMap for multithreading reasons.
          • Done.
          • Done.
          • After an offline discussion, we decided to ignore this comment. The current policy does not allow to use the logic itself without initialization.
          • Done.
          • Done.
          • Done.
          • Done.
          • Done.
          • Done.
          Show
          giovanni.fumarola Giovanni Matteo Fumarola added a comment - - edited Thanks Subru Krishnan for the feedbacks. Attached V2 that fixes them. Answers in order. Done. Done. Right now AbstractRouterPolicy has no the list of the active. The validation is done in the FederationPolicyUtils . Add a respective test. Done. Done. Done. Done. I use the ConcurrentHashMap for multithreading reasons. Done. Done. After an offline discussion, we decided to ignore this comment. The current policy does not allow to use the logic itself without initialization. Done. Done. Done. Done. Done. Done.
          Hide
          subru Subru Krishnan added a comment - - edited

          Thanks Giovanni Matteo Fumarola for addressing my feedback. The latest patch LGTM (pending Yetus), have a couple of nits:

          • setupUser can be done before initializing the next interceptor in AbstractClientRequestInterceptor.
          • A few comments on FederationClientInterceptor::submitApplication:
            • Avoid the continues by switching the checks, i.e. return response if not null, otherwise log warn followed by completion of loop iteration.
            • We should throw exception instead of returning nulls to be consistent with ClientRMService.
            • Fix the log statements based on the above changes.
          Show
          subru Subru Krishnan added a comment - - edited Thanks Giovanni Matteo Fumarola for addressing my feedback. The latest patch LGTM (pending Yetus), have a couple of nits: setupUser can be done before initializing the next interceptor in AbstractClientRequestInterceptor . A few comments on FederationClientInterceptor::submitApplication : Avoid the continues by switching the checks, i.e. return response if not null, otherwise log warn followed by completion of loop iteration. We should throw exception instead of returning nulls to be consistent with ClientRMService . Fix the log statements based on the above changes.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 9 new or modified test files.
          0 mvndep 1m 45s Maven dependency ordering for branch
          +1 mvninstall 17m 34s YARN-2915 passed
          +1 compile 9m 22s YARN-2915 passed
          +1 checkstyle 1m 0s YARN-2915 passed
          +1 mvnsite 2m 8s YARN-2915 passed
          +1 findbugs 3m 37s YARN-2915 passed
          +1 javadoc 1m 44s YARN-2915 passed
          0 mvndep 0m 11s Maven dependency ordering for patch
          +1 mvninstall 1m 31s the patch passed
          +1 compile 5m 28s the patch passed
          +1 javac 5m 28s the patch passed
          -0 checkstyle 0m 54s hadoop-yarn-project/hadoop-yarn: The patch generated 10 new + 225 unchanged - 1 fixed = 235 total (was 226)
          +1 mvnsite 2m 0s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 xml 0m 1s The patch has no ill-formed XML file.
          -1 findbugs 0m 38s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
          +1 javadoc 1m 39s the patch passed
          +1 unit 0m 32s hadoop-yarn-api in the patch passed.
          +1 unit 2m 25s hadoop-yarn-common in the patch passed.
          +1 unit 1m 16s hadoop-yarn-server-common in the patch passed.
          +1 unit 0m 27s hadoop-yarn-server-router in the patch passed.
          +1 asflicense 0m 34s The patch does not generate ASF License warnings.
          66m 56s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router
            Read of unwritten field user in org.apache.hadoop.yarn.server.router.clientrm.DefaultClientRequestInterceptor.init(String) At DefaultClientRequestInterceptor.java:in org.apache.hadoop.yarn.server.router.clientrm.DefaultClientRequestInterceptor.init(String) At DefaultClientRequestInterceptor.java:[line 109]
            Field only ever set to null:null: org.apache.hadoop.yarn.server.router.clientrm.DefaultClientRequestInterceptor.user In DefaultClientRequestInterceptor.java
            Dead store to subClustersActive in org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(GetNewApplicationRequest) At FederationClientInterceptor.java:org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(GetNewApplicationRequest) At FederationClientInterceptor.java:[line 231]
            Write to static field org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.numSubmitRetries from instance method org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.init(String) At FederationClientInterceptor.java:from instance method org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.init(String) At FederationClientInterceptor.java:[line 153]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue YARN-3659
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12874336/YARN-3659-YARN-2915.2.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux a5289b70e995 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision YARN-2915 / ff2f39a
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16240/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/16240/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16240/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.html
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16240/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/16240/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 9 new or modified test files. 0 mvndep 1m 45s Maven dependency ordering for branch +1 mvninstall 17m 34s YARN-2915 passed +1 compile 9m 22s YARN-2915 passed +1 checkstyle 1m 0s YARN-2915 passed +1 mvnsite 2m 8s YARN-2915 passed +1 findbugs 3m 37s YARN-2915 passed +1 javadoc 1m 44s YARN-2915 passed 0 mvndep 0m 11s Maven dependency ordering for patch +1 mvninstall 1m 31s the patch passed +1 compile 5m 28s the patch passed +1 javac 5m 28s the patch passed -0 checkstyle 0m 54s hadoop-yarn-project/hadoop-yarn: The patch generated 10 new + 225 unchanged - 1 fixed = 235 total (was 226) +1 mvnsite 2m 0s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 xml 0m 1s The patch has no ill-formed XML file. -1 findbugs 0m 38s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) +1 javadoc 1m 39s the patch passed +1 unit 0m 32s hadoop-yarn-api in the patch passed. +1 unit 2m 25s hadoop-yarn-common in the patch passed. +1 unit 1m 16s hadoop-yarn-server-common in the patch passed. +1 unit 0m 27s hadoop-yarn-server-router in the patch passed. +1 asflicense 0m 34s The patch does not generate ASF License warnings. 66m 56s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router   Read of unwritten field user in org.apache.hadoop.yarn.server.router.clientrm.DefaultClientRequestInterceptor.init(String) At DefaultClientRequestInterceptor.java:in org.apache.hadoop.yarn.server.router.clientrm.DefaultClientRequestInterceptor.init(String) At DefaultClientRequestInterceptor.java: [line 109]   Field only ever set to null:null: org.apache.hadoop.yarn.server.router.clientrm.DefaultClientRequestInterceptor.user In DefaultClientRequestInterceptor.java   Dead store to subClustersActive in org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(GetNewApplicationRequest) At FederationClientInterceptor.java:org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getNewApplication(GetNewApplicationRequest) At FederationClientInterceptor.java: [line 231]   Write to static field org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.numSubmitRetries from instance method org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.init(String) At FederationClientInterceptor.java:from instance method org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.init(String) At FederationClientInterceptor.java: [line 153] Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-3659 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12874336/YARN-3659-YARN-2915.2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux a5289b70e995 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision YARN-2915 / ff2f39a Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16240/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/16240/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16240/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.html Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16240/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/16240/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            People

            • Assignee:
              giovanni.fumarola Giovanni Matteo Fumarola
              Reporter:
              giovanni.fumarola Giovanni Matteo Fumarola
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:

                Development