Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.1.0-beta
Description
Support HA admin operations to facilitate transitioning the RM to Active and Standby states.
Attachments
Attachments
- yarn-1068-prelim.patch
- 9 kB
- Karthik Kambatla
- yarn-1068-9.patch
- 26 kB
- Karthik Kambatla
- yarn-1068-8.patch
- 26 kB
- Karthik Kambatla
- yarn-1068-7.patch
- 16 kB
- Karthik Kambatla
- yarn-1068-6.patch
- 16 kB
- Karthik Kambatla
- yarn-1068-5.patch
- 16 kB
- Karthik Kambatla
- yarn-1068-4.patch
- 16 kB
- Karthik Kambatla
- yarn-1068-3.patch
- 16 kB
- Karthik Kambatla
- yarn-1068-2.patch
- 10 kB
- Karthik Kambatla
- yarn-1068-14.patch
- 48 kB
- Karthik Kambatla
- yarn-1068-13.patch
- 47 kB
- Karthik Kambatla
- yarn-1068-12.patch
- 41 kB
- Karthik Kambatla
- yarn-1068-11.patch
- 31 kB
- Karthik Kambatla
- yarn-1068-10.patch
- 29 kB
- Karthik Kambatla
- yarn-1068-1.patch
- 9 kB
- Karthik Kambatla
- YARN-1068.Karthik.patch
- 48 kB
- Bikas Saha
Issue Links
- is blocked by
-
YARN-1027 Implement RMHAProtocolService
- Closed
Activity
FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/)
YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536888)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/378/)
YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536888)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
Resolving this as bikassaha committed to trunk and branch-2.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12610924/YARN-1068.Karthik.patch
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2311//console
This message is automatically generated.
SUCCESS: Integrated in Hadoop-trunk-Commit #4668 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4668/)
YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536888)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
Uploading patch with minor edits to add comments about dependency on YARN-1318
Haven't looked at the patch carefully, but okay with pushing out the service merge assuming YARN-1318 is a blocker for the next release that ships the fail-over work.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12610366/yarn-1068-14.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2287//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2287//console
This message is automatically generated.
Updated patch to add a comment on why RMHAProtocolService is Private-Unstable and a pointer to YARN-1318.
vinodkv Does the new patch address your concerns?
I have marked YARN-1318 as a blocker for YARN-149. We must fix that before failover is available. Karthik, in your final patch can you please include clear comments pointing to YARN-1318 near the @Private annotations for RMHAProtocolService. Thanks!
Tested the latest patch again on a real cluster.
bikassaha, vinodkv - I believe the patch is ready for review. Please take a look when you get a chance.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12609970/yarn-1068-13.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2272//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2272//console
This message is automatically generated.
Updated patch extends RMAdminCLI to support admin operations.
Todo: Add unit tests.
Merging the CLI now makes any subsequent changes transparent to the user. Makes sense. Will work on the patch and report back.
Spoke to Vinod offline. His suggestion is to make sure that the admin cli is merged within this patch. We may defer the AdminService merge and make that a blocker for making the RMHAProtocolService public. Also not expose port configs in the yarn-defaults.
Thanks Bikas. I agree with your assessment that moving forward with the remaining work on failover, while working on the merging of RMHAProtocolService and AdminService in the background makes sense to me.
I ll post a patch shortly to reflect the same.
I havent looked deeply at what it would take to make AdminService always on. From your comment in YARN-1318 is looks like its significant surgery. In the interest of getting end to end RM failover move forward with crucial integration work with ZKFC and other testing can we mark the RMHAProtocolService private and unstable and move on with the integration work? The merger of RMHAProtocolService and AdminService can be made a blocker for making RMHAProtocolService public/stable. Does this work?
Thanks vinodkv. I do agree that we should avoid creating new RPC servers and add more ports to the configuration.
Created YARN-1318 to make AdminService an Always-On service. Unfortunately, that requires significant changes to ResourceManager and RMContext as well. https://issues.apache.org/jira/browse/YARN-1318?focusedCommentId=13798358&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13798358 provides a gist of the required changes.
vinodkv, bikassaha can you please take a look at YARN-1318 as well and provide your thoughts.
IIUC, the suggestion is to use the RPC server from AdminService. AdminService currently is an Active service and not an Always-On service, so doesn't start until the RM transitions to Active. Moving the AdminService to Always-On requires defining the semantics when the RM is Standby.
I'd do this instead of adding a new service. For now as well as long term, we want to deny all the existing AdminService operations on Standby. Doing it via not stopping the server or explicitly rejecting the requests is an implementation detail and not a big change in semantics.
Repeating what I said, we originally added AdminService separately from client-service only for prioritizing admin operations. No need for a new server for this.
To do this, we need to have RMAdminCLI extend HAAdmin, and augment the run() method to call super.run() when applicable, and the usage needs to be augmented to include the HAAdmin usage.
Yes. I guess there is no argument here other than stating the obvious.
YARN expects the actual PB/PBImpl files to be at a particular location, and can't find the corresponding files when using HAServiceProtocol from common. Hence, had to use PB interfaces.
HAServiceProtocolPB is the PB interface and there seems to be no PBImpl as Common/HDFS follow a different pattern from YARN's and the last I heard they liked YARN's PB impl stuff. In any case, +1 to skip using YARNRPC for that reason.
The patch primarily adds command line support for HA transitions. Have tested this manually several times on a real cluster.
We need junit tests for everything. We can skip down to manual tests for hard-to-test race conditions or security features that are not possible to address otherwise. Manual testing is not a substitute for junit tests.
Sorry, should have been clearer - was referring primarily to the PB interface use. Let me address each point separately.
Let's try to avoid adding one more server. We added AdminService separately from client-service only for QOS sake for admin operations. HAAdminService should be listening on the same port as all other operations.
IIUC, the suggestion is to use the RPC server from AdminService. AdminService currently is an Active service and not an Always-On service, so doesn't start until the RM transitions to Active. Moving the AdminService to Always-On requires defining the semantics when the RM is Standby.
Following that, we should use only one rmadmin CLI for fail-over commands too.
To do this, we need to have RMAdminCLI extend HAAdmin, and augment the run() method to call super.run() when applicable, and the usage needs to be augmented to include the HAAdmin usage.
RMHAProtocolService: We don't directly use PB interfaces in YARN. Let's not change that here - use YarnRPC to create servers.
YARN expects the actual PB/PBImpl files to be at a particular location, and can't find the corresponding files when using HAServiceProtocol from common. Hence, had to use PB interfaces.
No tests specifically for the new code added here?
The patch primarily adds command line support for HA transitions. Have tested this manually several times on a real cluster.
Trying to use common code is forcing us to start a new RPC-server? I don't quite follow it, more details may be?
vinodkv, the inconsistencies with the rest of the YARN code come primarily from our attempt to re-use Common code. Our attempt has been to use the Common code as much as possible. Do you think we should keep it independent?
Catching up on all the HA stuff. Starting with this patch.
Quick comments on the patch:
- Let's try to avoid adding one more server. We added AdminService separately from client-service only for QOS sake for admin operations. HAAdminService should be listening on the same port as all other operations.
- Following that, we should use only one rmadmin CLI for fail-over commands too.
- RMHAProtocolService: We don't directly use PB interfaces in YARN. Let's not change that here - use YarnRPC to create servers.
- No tests specifically for the new code added here?
Will keep digging into the fail-over code more. Let me know if some of my assumptions are wrong. Tx.
Per discussion on YARN-1309 and YARN-142, looks like we should throw YarnException and not IOException. However, the actual exceptions to be thrown are defined in HAServiceProtocol which doesn't have YarnException listed.
So, I guess we will have to leave the RMHAProtocolService as is.
Thanks Bikas.
There isnt any need for this to wrap the IOException in another exception. The base AdminService protocol signature already supports throwing IOException.
Agree. I did consider leaving it as IOE. However, there are several places in AdminService where an IOE is being wrapped into a YarnException. We should probably address all of them together in another JIRA. Created YARN-1309.
Looks good to me. Will give a day or so for some other committers to take a look.
There isnt any need for this to wrap the IOException in another exception. The base AdminService protocol signature already supports throwing IOException. (ResourceManagerAdministrationProtocol). If its small enough, we could fix it this here or do it in a separate jira.
private UserGroupInformation checkAcls(String method) throws YarnException { - UserGroupInformation user; try { - user = UserGroupInformation.getCurrentUser(); + return RMServerUtils.verifyAccess(adminAcl, method, LOG); } catch (IOException ioe) { - LOG.warn("Couldn't get current user", ioe); - - RMAuditLogger.logFailure("UNKNOWN", method, - adminAcl.toString(), "AdminService", - "Couldn't get current user"); throw RPCUtil.getRemoteException(ioe); }
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12608393/yarn-1068-11.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2177//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2177//console
This message is automatically generated.
Lets stick to what existing yarn code looks like IMO. If existing approach can be made better then we should do probably it in a different patch. vinodkv do you know why we are not following common naming practice for the above.
New name doesnt seem to follow convention based on other names in that file. YARN_SECURITY_SERVICE_AUTHORIZATION_FOO
Right. The property comes from Common HA code, and I am using it as is. The alternative is to add YARN_SECURITY_SERVICE_AUTHORIZATION_HA_SERVICE_PROTOCOL to YarnConfiguration and point it to the other one. Is that the preferred approach?
Will post a patch addressing the other comments.
This should probably creating a new conf and override it instead of changing things in the original conf.
+ YarnConfiguration conf = (YarnConfiguration) getConf(); + conf.set(YarnConfiguration.RM_HA_ID, rmId); + return new RMHAServiceTarget(conf);
Shoudlnt it be "transitionToActive"?
+ RMAuditLogger.logFailure(user.getShortUserName(), "transitionToStandby", + adminAcl.toString(), "RMHAProtocolService", + "Exception transitioning to active");
We shouldnt be wrapping some unknown exception into an AccessControlException
+ private UserGroupInformation checkAccess(String method) throws AccessControlException { + try { + return RMServerUtils.verifyAccess(adminAcl, method, LOG); + } catch (YarnException e) { + throw new AccessControlException(e); + }
This method isnt even throwing an accesscontrolexception. Then why are transitionToStandby() etc changing signature to throw AccessControlException.
+ public static UserGroupInformation verifyAccess( + AccessControlList acl, String method, final Log LOG) + throws YarnException {
New name doesnt seem to follow convention based on other names in that file. YARN_SECURITY_SERVICE_AUTHORIZATION_FOO
new Service( YarnConfiguration.YARN_SECURITY_SERVICE_AUTHORIZATION_CONTAINER_MANAGEMENT_PROTOCOL, ContainerManagementProtocolPB.class), + new Service( + CommonConfigurationKeys.SECURITY_HA_SERVICE_PROTOCOL_ACL, + HAServiceProtocol.class),
Forgot to mention the testing done.
Ran a two node cluster - each node running an RM, and used the CLI to transition each to active/standby, getServiceState, checkHealth several (~100) times for various versions of the patch put together. Verified the latest patch on a pseudo-dist cluster with configs for both RMs pointing to the same RM.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12607623/yarn-1068-10.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2155//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2155//console
This message is automatically generated.
Thanks bikassaha for the detailed review. Sorry for the delay in responding, was caught up with some other issues. Uploaded a patch that addresses most of the comments:
- Audit logging in RMHAProtocolService
- Update AdminService also to use RMServerUtils#verifyAccess
- YarnConfiguration overrides updateConnectAddr as well to complement getSocketAddr
- Change argument names from nodeId to rmId
cast not needed right?
Currently, RMHAServiceTarget constructor takes YarnConfiguration as an argument and not Configuration. We can change this, but I think it is better to be explicit in the kind of instance needed. Leaving the constructor as is requires the cast.
Should this be RMHAServiceProtocol address? Admins and ZKFC would be connecting on this protocol right?
Updated the description to reflect that the RM listens at that address for both rmhaadmin CLI and the failover controller (ZKFC). I think we should leave the config name as ha.admin.address for the following reasons: (1) the user/admin understand admin better than HAProtocolService as the latter requires them to understand the protocol being used, (2) either CLI or ZKFC or actually administrating the HA state of the RM, (3) we don't use protocol names anywhere else in the configs.
Should this be RMHAServiceProtocol address? Admins and ZKFC would be connecting on this protocol right?
+ public static final String RM_HA_ADMIN_ADDRESS = + RM_HA_PREFIX + "admin.address"; ... + <description>The address of the RM HA admin interface.</description> + <name>yarn.resourcemanager.ha.admin.address</name> + <value>${yarn.resourcemanager.hostname}:8034</value> ... + private AccessControlList adminAcl; + private Server haAdminServer;
instanceof check before creating new conf?
+ public void setConf(Configuration conf) { + if (conf != null) { + conf = new YarnConfiguration(conf); + } + super.setConf(conf);
More clear if we name the parameter rmId instead of nodeId
protected HAServiceTarget resolveTarget(String nodeId) {
cast not needed right?
+ try { + return new RMHAServiceTarget((YarnConfiguration)getConf(), nodeId);
Should this create its own copy of the conf instead of overriding the CLI's copy? If its ok to override then we should probably do it in the CLI itself instead of being the side effect of a constructor.
public RMHAServiceTarget(YarnConfiguration conf, String nodeId) + throws IOException { + conf.set(YarnConfiguration.RM_HA_ID, nodeId);
After this how is someone supposed to get the socker addr? conf.getSocketAddr() will do the HAUtil magic instead of directly picking what is set by this code?
+ haAdminServer.start(); + conf.updateConnectAddr(YarnConfiguration.RM_HA_ADMIN_ADDRESS, + haAdminServer.getListenerAddress());
Good to refactor. Looks like we should change AdminService to use the same method. Also, this method returns a user value that is used by AdminService to do audit logging. Would be good to follow that pattern and do audit logging in HAService at least for the state transition operations. probably not for the health check operations.
+ try {
+ RMServerUtils.verifyAccess(adminAcl, method, LOG);
Any notes on testing. There doesnt seem to be any new unit tests added.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12607094/yarn-1068-9.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2135//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2135//console
This message is automatically generated.
Patch to fix the javadoc and findbugs issues. The test failures seem unrelated.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12607039/yarn-1068-8.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 javadoc. The javadoc tool appears to have generated 1 warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections
The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
org.apache.hadoop.yarn.client.api.impl.TestNMClient
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2127//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2127//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2127//console
This message is automatically generated.
Patch on top of YARN-1232, that addresses Bikas' earlier comments about ACL checking as well.
Lets see how YARN-986 and YARN-1232 develop and make a call based on that. Would like to avoid committing stuff that will soon be changed a lot.
Thanks bikassaha, agree with most of your points.
AdminService does not use the HAServiceProtocolServerSideTranslatorPB pattern
The reason for this is our attempt to reuse most of the common code - protos and client implementations.
Having thought about this, it seems to me that this jira is actually blocked by
YARN-986.
To fix the admin support in entirety, I agree that we need YARN-1232 and YARN-986. That said, for ease of development, I would propose splitting the admin support into two parts (JIRAs) - basic support (this JIRA) to go in first to help testing YARN-1232 and YARN-986, and complete admin support that adds the remaining parts. Otherwise, we need applying this over those other JIRAs to test. Thoughts?
It would be educative to compare the HAAdmin server start code with existing admin RM server like the AdminService. I notice 2 things.
1) AdminService does not use the HAServiceProtocolServerSideTranslatorPB pattern
2) AdminService does something with HADOOP_SECURITY_AUTHORIZATION which is missing in HAAdminService. This probably defines who has access to perform the admin operations. We will likely need that for the HAAdmin right?
Having thought about this, it seems to me that this jira is actually blocked by YARN-986. Without a concept of a logical name how can we expect the CLI etc to find the correct RM address from configuration? The client conf files would be expected to have entries for all RM instances and we would need to be able to issue admin commands to any one of them. So we need to be able to address them via a logical name, right? So the current approach that picks the RM_HA_ADMIN_SERVICE address does not seem like a viable solution. Similarly, server conf files would need to tell the server what its logical name is so that it can try to pick and instance specific configurations. This is precisely why we have the HAAdmin.resolveTarget() method.
Again, it would be educative to look at NNHAServiceTarget for client side and the constructor for NameNode where is uses the logical name to translate and re-write server side conf.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12604842/yarn-1068-7.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2000//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2000//console
This message is automatically generated.
One nit, in the RMHAProtocolService, the serviceStop() should be symmetric with the start in the sense it should do the if (haEnabled) check to stop the HAAdmin server (instead of doing this check in the HAAdmin service itself).
bikassaha, when you get a chance, can you review the latest patch?
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12604708/yarn-1068-6.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1991//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1991//console
This message is automatically generated.
Minor update to the patch - RMHAProtocolService should continue to be an AbstractService, and not a CompositeService.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12604406/yarn-1068-5.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1988//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1988//console
This message is automatically generated.
The client side code should probably be invoked via RMProxy like the other protocols are accessed today. In a failover setup the following may not be valid.
I have updated the patch to fetch the InetSocketAddress from ClientRMProxy#getRMAddress. Not using ClientRMProxy#createProxy because we want the HA related admin calls to go to the intended RM irrespective of whether it is active or not.
I think YARN-1028 (ConfiguredFailoverProxy) should update this if need be, along with other changes it makes.
Updating the patch that moves the ha-admin rpc server to HAProtocolService.
If I understand this correctly, the jira is adding RPC server support for HAServiceProtocol on the server side and client support for admins. The RPC server is meant for the HAServiceProtocol. Why do we need to create an RMHAAdminService? This RPC server should simply be embedded in the existing RMHAServiceProtocol that logically implements the HAServiceProtocol. Isnt that needed for ZKFC (and other clients like the admin cli) to connect to the RM and issue transitions commands? Other protocol servers are embedded similarly in the RM. e.g. ApplicationMasterService creates the ApplicationMasterProtocol RPC server and passes itself into the RPC server for callbacks.
The client side code should probably be invoked via RMProxy like the other protocols are accessed today. In a failover setup the following may not be valid.
+ RMHAServiceTarget() {
+ YarnConfiguration targetConf = new YarnConfiguration(getConf());
+ haAdminServiceAddress = targetConf.getSocketAddr(
+ YarnConfiguration.RM_HA_ADMIN_ADDRESS,
+ YarnConfiguration.DEFAULT_RM_HA_ADMIN_ADDRESS,
+ YarnConfiguration.DEFAULT_RM_HA_ADMIN_PORT);
Another downside is having yet another port to configure on the RM. 1 for HAProtocol and 1 for HAAdmin. Do we envisage an RM where one of them are present without the other?
Only the RMHAAdminService listens on a port, not the RMHAProtocolService. Am I missing something here?
Let me be more elaborate. The MockRM overrides several RM#create*() methods to create services which don't start the listening servers. Similarly, I wanted to override createRMHAProtocolService in MockRM to create an instance of RMHAProtocolService which doesn't start the HA admin listener. Now, consider the two scenarios:
- HAAdmin functionality embedded in RMHAProtocolService
RMHAProtocolService#serviceStart() would look like the following:protected void serviceStart() { if (haEnabled) { transitionToStandby(true); // create admin server logic // start admin server } else { transitionToActive(); } super.serviceStart();
and the MockRM override would look like:
@Override protected void createRMHAProtocolService() { return new RMHAProtocolService() { @Override protected void serviceStart() { if (haEnabled) { transitionToStandby(true); } else { transitionToActive(); } super.serviceStart(); } }; }
Any changes to RMHAProtocolService#serviceStart should be replicated to MockRM.
- RMHAAdminService implements the admin functionality and is part of RMHAProtocolService (latest patch)
RMHAProtocolService#createRMHAAdminService() creates the admin service to use. If not null, the service is added to RMHAProtocolService and is inited/started/stopped along with it. Now, all MockRM needs to do is return null in MockRM#createRMHAAdminService.
Another downside is having yet another port to configure on the RM. 1 for HAProtocol and 1 for HAAdmin. Do we envisage an RM where one of them are present without the other?
This makes it very simple to create a RMHAProtocolService without the admin parts without having to override all its service*() calls. In the current patch, MockRM does the following.
Sorry, didnt quite follow this.
Having a separate RMHAAdminService allows handling it through the super.service*() calls in RMHAProtocolService. This makes it very simple to create a RMHAProtocolService without the admin parts without having to override all its service*() calls. In the current patch, MockRM does the following. If it was all in one RMHAService, this will be much longer and needs to be updated everytime the actual RMHAService service* methods get updated. Also, don't see particular downside to having it as a separate service except that it is class. The RM itself still sees a single service in RMHAProtocolService.
protected RMHAProtocolService createRMHAProtocolService() { return new RMHAProtocolService(this) { @Override protected RMHAAdminService createRMHAAdminService() { return null; } }; }
I meant the first one. Was wondering whether we should have a single RMHAService that does both the protocol and the admin parts. Seems to me that they belong to one place. So a pro/con discussion would help.
What are the pros and cons of making the new service part of the existing RMHAService instead of creating another RPC endpoint?
Sorry Bikas, I am not sure I understand the question. Can you elaborate a bit. Little background on the thought-process that went in:
- Should RMHAAdminService be a part of RMHAProtocolService? It can be, I just moved it to a separate service to make the code easier to understand and maintain. Can definitely merge it back.
- Should RMHAAdminService be a part of AdminService and use the same port instead of listening on another? I initially thought of doing that, but refrained for two reasons: (1) Better to have two listeners to address two protocols - the AdminProtocol and HAAdminProtocol. (2) AdminService itself is not an Always-On service at the moment. To move it to Always-On, we need to make it HA-aware which could potentially take longer time.
What are the pros and cons of making the new service part of the existing RMHAService instead of creating another RPC endpoint?
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12603885/yarn-1068-4.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1960//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1960//console
This message is automatically generated.
Thanks Alejandro for catching it, completely missed that.
Uploading a patch that creates RMHAAdminService only when HA is enabled.
Unless I'm missing something, the HAAdmin service is always on and listening, is this correct? The always on is OK, but listening to a port, do we need it?
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12603703/yarn-1068-3.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1948//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1948//console
This message is automatically generated.
Updated patch moves the ipc.Server creation and start to start() and MockRM to not start the server - verified these changes fix the test failures.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12603434/yarn-1068-2.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
org.apache.hadoop.yarn.client.api.impl.TestYarnClient
org.apache.hadoop.yarn.client.api.impl.TestNMClient
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice.TestApplicationMasterService
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
org.apache.hadoop.yarn.server.resourcemanager.TestRM
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1940//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1940//console
This message is automatically generated.
the patch seems to depend on another patch being applied first, which one?
I am uploading a preliminary patch adds admin support for HA operations for any feedback on the approach.
The patch is very much along the lines of HDFS admin implementation and reuses the common code for the same. Outline:
- RMHAProtocolService starts an RPC server for HA commands.
- yarn rmhaadmin command invokes RMHAdminCLI which extends HAAdmin
I haven't figured out how to use the ClientRMProxy while using HAAdmin yet. Would love to hear any thoughts/inputs on that.
Pending tasks: (1) yarn-site, (2) RPC server instantiation through YARNRPC calls like in AdminService.
FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/)
YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536888)