Hadoop YARN
  1. Hadoop YARN
  2. YARN-128 RM Restart
  3. YARN-353

Add Zookeeper-based store implementation for RMStateStore

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0
    • Component/s: resourcemanager
    • Labels:
      None

      Description

      Add store that write RM state data to ZK

      1. YARN-353.16.1.patch
        50 kB
        Hitesh Shah
      2. YARN-353.16.patch
        51 kB
        Karthik Kambatla
      3. YARN-353.15.patch
        49 kB
        Karthik Kambatla
      4. YARN-353.14.patch
        48 kB
        Karthik Kambatla
      5. YARN-353.13.patch
        44 kB
        Karthik Kambatla
      6. YARN-353.12.patch
        44 kB
        Karthik Kambatla
      7. yarn-353-12-wip.patch
        43 kB
        Karthik Kambatla
      8. YARN-353.11.patch
        44 kB
        Karthik Kambatla
      9. YARN-353.10.patch
        43 kB
        Jian He
      10. YARN-353.9.patch
        42 kB
        Jian He
      11. YARN-353.8.patch
        42 kB
        Jian He
      12. YARN-353.7.patch
        43 kB
        Jian He
      13. YARN-353.6.patch
        43 kB
        Jian He
      14. YARN-353.5.patch
        35 kB
        Jian He
      15. YARN-353.4.patch
        35 kB
        Jian He
      16. YARN-353.3.patch
        35 kB
        Jian He
      17. YARN-353.2.patch
        34 kB
        Jian He
      18. YARN-353.1.patch
        34 kB
        Bikas Saha

        Issue Links

          Activity

          Hide
          Bikas Saha added a comment -

          Attaching patch from YARN-231 after rebasing with FS changes. Some of the discussion is on YARN-231.

          Show
          Bikas Saha added a comment - Attaching patch from YARN-231 after rebasing with FS changes. Some of the discussion is on YARN-231 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12566356/YARN-353.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/365//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/365//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/365//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566356/YARN-353.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/365//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/365//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/365//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Thanks Bikas. Mostly looks good. Can you address the findbugs and rebase against trunk. Will post a detailed review (couple of nits) on the updated patch.

          Show
          Karthik Kambatla added a comment - Thanks Bikas. Mostly looks good. Can you address the findbugs and rebase against trunk. Will post a detailed review (couple of nits) on the updated patch.
          Hide
          Jian He added a comment -

          I'm taking this over

          Show
          Jian He added a comment - I'm taking this over
          Hide
          Jian He added a comment -

          rebased the patch and added RMDelegationToken restore implementation for the ZKStateStore

          Show
          Jian He added a comment - rebased the patch and added RMDelegationToken restore implementation for the ZKStateStore
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590350/YARN-353.2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1414//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1414//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1414//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590350/YARN-353.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1414//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1414//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1414//console This message is automatically generated.
          Hide
          Devaraj K added a comment -

          The patch overall looks good, here are my observations on the patch.

          1.

          +  <property>
          +    <description>ACL's to be used for ZooKeeper znodes.
          +    This may be supplied when using
          +    org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
          +    as the value for yarn.resourcemanager.store.class</description>
          +    <name>yarn.resourcemanager.zk.rm-state-store.timeout.ms</name>
          +    <!--value>world:anyone:rwcda</value-->
          +  </property>
          

          Here configuration name should be "yarn.resourcemanager.zk.rm-state-store.acl".

          2.

          +  // protected to mock for testing
          +  protected synchronized ZooKeeper getNewZooKeeper() throws Exception {
          

          Can we also annotate with @VisibleForTesting for this method?

          3.

          +  /** HostPort of ZK server for ZKRMStateStore */
          +    <description>HostPort of the ZooKeeper server when using 
          

          These two places can we use Host:Port instead of HostPort for comment/description.

          4.

          +    zkHostPort = conf.get(YarnConfiguration.ZK_RM_STATE_STORE_ADDRESS);
          

          Can we use the default value for this config with this as present for other props,

          +    <!--value>127.0.0.1:2181</value-->
          

          5.

          +  public static final String DEFAULT_ZK_RM_STATE_STORE_PARENT_PATH = "";
          

          Can we use the default value for this config with this instead of having empty,

          +    <!--value>/rmstore</value-->
          
          Show
          Devaraj K added a comment - The patch overall looks good, here are my observations on the patch. 1. + <property> + <description> ACL's to be used for ZooKeeper znodes. + This may be supplied when using + org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore + as the value for yarn.resourcemanager.store.class </description> + <name> yarn.resourcemanager.zk.rm-state-store.timeout.ms </name> + <!--value> world:anyone:rwcda </value--> + </property> Here configuration name should be "yarn.resourcemanager.zk.rm-state-store.acl". 2. + // protected to mock for testing + protected synchronized ZooKeeper getNewZooKeeper() throws Exception { Can we also annotate with @VisibleForTesting for this method? 3. + /** HostPort of ZK server for ZKRMStateStore */ + <description> HostPort of the ZooKeeper server when using These two places can we use Host:Port instead of HostPort for comment/description. 4. + zkHostPort = conf.get(YarnConfiguration.ZK_RM_STATE_STORE_ADDRESS); Can we use the default value for this config with this as present for other props, + <!--value> 127.0.0.1:2181 </value--> 5. + public static final String DEFAULT_ZK_RM_STATE_STORE_PARENT_PATH = ""; Can we use the default value for this config with this instead of having empty, + <!--value> /rmstore </value-->
          Hide
          Jian He added a comment -

          Devaraj, thanks for your review
          new patch, fixed the findbug and comments.

          Show
          Jian He added a comment - Devaraj, thanks for your review new patch, fixed the findbug and comments.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590474/YARN-353.3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1416//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1416//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590474/YARN-353.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1416//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1416//console This message is automatically generated.
          Hide
          Devaraj K added a comment -

          The latest patch looks good to me except one nit.

          +  public static final String DEFAULT_ZK_RM_STATE_STORE_PARENT_PATH = "rmstore";
          

          Here this path should start with '/', otherwise zkclient will throw IllegalArgumentException.

          Show
          Devaraj K added a comment - The latest patch looks good to me except one nit. + public static final String DEFAULT_ZK_RM_STATE_STORE_PARENT_PATH = "rmstore" ; Here this path should start with '/', otherwise zkclient will throw IllegalArgumentException.
          Hide
          Jian He added a comment -

          right, thanks for pointing that out.
          Updated the patch, made some changes along with that.

          Show
          Jian He added a comment - right, thanks for pointing that out. Updated the patch, made some changes along with that.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590691/YARN-353.4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1420//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1420//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590691/YARN-353.4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1420//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1420//console This message is automatically generated.
          Hide
          Bikas Saha added a comment -

          I dont think it makes sense to have default value for this. ZK location is not something we control and we cannot assume it to be running on some default location. The commented value in the default.xml file is just for a syntax example.

          +  public static final String DEFAULT_ZK_RM_STATE_STORE_ADDRESS =
          +      "127.0.0.1:2181";
          

          Wherever we are doing multiple operations, we should probably use the ZK multi API's to guarantee atomic operations.

          +        + latestSequenceNumber);
          +    try {
          +      if (dtSequenceNumberPath != null) {
          +        deleteWithRetries(dtSequenceNumberPath, 0);
          +      }
          +      createWithRetries(latestSequenceNumberPath, null, zkAcl,
          +        CreateMode.PERSISTENT);
          +    } catch (Exception e) {
          +      LOG.info("Error in storing " + dtSequenceNumberPath);
          +      throw e;
          +    }
          +    dtSequenceNumberPath = latestSequenceNumberPath;
          
          Show
          Bikas Saha added a comment - I dont think it makes sense to have default value for this. ZK location is not something we control and we cannot assume it to be running on some default location. The commented value in the default.xml file is just for a syntax example. + public static final String DEFAULT_ZK_RM_STATE_STORE_ADDRESS = + "127.0.0.1:2181" ; Wherever we are doing multiple operations, we should probably use the ZK multi API's to guarantee atomic operations. + + latestSequenceNumber); + try { + if (dtSequenceNumberPath != null ) { + deleteWithRetries(dtSequenceNumberPath, 0); + } + createWithRetries(latestSequenceNumberPath, null , zkAcl, + CreateMode.PERSISTENT); + } catch (Exception e) { + LOG.info( "Error in storing " + dtSequenceNumberPath); + throw e; + } + dtSequenceNumberPath = latestSequenceNumberPath;
          Hide
          Jian He added a comment -

          I dont think it makes sense to have default value for this. ZK location is not something we control and we cannot assume it to be running on some default location.

          Yes, we can not assume which location ZK is ruining on, but I think the result would be the same if we provide a default or leave it empty, botch cases should raise connect exception or something, which leads the user to config the true address. One bonus doing such might make user easier in test mode where ZK is running on its defaults, your opinion?

          Show
          Jian He added a comment - I dont think it makes sense to have default value for this. ZK location is not something we control and we cannot assume it to be running on some default location. Yes, we can not assume which location ZK is ruining on, but I think the result would be the same if we provide a default or leave it empty, botch cases should raise connect exception or something, which leads the user to config the true address. One bonus doing such might make user easier in test mode where ZK is running on its defaults, your opinion?
          Hide
          Bikas Saha added a comment -

          No. It must be required for the user to specify this. We cannot assume some random address if the user has not specified a value. The code should throw an exception if this is not specified.

          Show
          Bikas Saha added a comment - No. It must be required for the user to specify this. We cannot assume some random address if the user has not specified a value. The code should throw an exception if this is not specified.
          Hide
          Jian He added a comment -

          Any downside of doing that ?

          Show
          Jian He added a comment - Any downside of doing that ?
          Hide
          Bikas Saha added a comment -

          Downside of doing what? Throwing clear exception will alert the user that the address is not configured and so the RM will not start.

          Show
          Bikas Saha added a comment - Downside of doing what? Throwing clear exception will alert the user that the address is not configured and so the RM will not start.
          Hide
          Jian He added a comment -

          sorry, I meant downside of giving a default ZK address. yeah, throwing an exception would be clear.

          Show
          Jian He added a comment - sorry, I meant downside of giving a default ZK address. yeah, throwing an exception would be clear.
          Hide
          Bikas Saha added a comment -

          I really don't know how to explain the downside of having a bogus default ZK address to which the RM will try to connect. This should be obvious. Either that or I am seeing things differently from you.

          Show
          Bikas Saha added a comment - I really don't know how to explain the downside of having a bogus default ZK address to which the RM will try to connect. This should be obvious. Either that or I am seeing things differently from you.
          Hide
          Devaraj K added a comment -

          If we see,

           public static final String RM_RESOURCE_TRACKER_ADDRESS =
              RM_PREFIX + "resource-tracker.address";
            public static final int DEFAULT_RM_RESOURCE_TRACKER_PORT = 8031;
            public static final String DEFAULT_RM_RESOURCE_TRACKER_ADDRESS =
              "0.0.0.0:" + DEFAULT_RM_RESOURCE_TRACKER_PORT;
          

          Here also we should configure the RM address if the RM is running in other machine than NM, otherwise it will be fine.

          Similarly we can provide the default value for zk address and will be useful if it is single node cluster or test environment.

          Show
          Devaraj K added a comment - If we see, public static final String RM_RESOURCE_TRACKER_ADDRESS = RM_PREFIX + "resource-tracker.address" ; public static final int DEFAULT_RM_RESOURCE_TRACKER_PORT = 8031; public static final String DEFAULT_RM_RESOURCE_TRACKER_ADDRESS = "0.0.0.0:" + DEFAULT_RM_RESOURCE_TRACKER_PORT; Here also we should configure the RM address if the RM is running in other machine than NM, otherwise it will be fine. Similarly we can provide the default value for zk address and will be useful if it is single node cluster or test environment.
          Hide
          Jian He added a comment -

          New patch. Talked with Bikas offline, still remove default zk address as its out of yarn control.
          Also changed to use zk multi API to do multiple operations.

          Show
          Jian He added a comment - New patch. Talked with Bikas offline, still remove default zk address as its out of yarn control. Also changed to use zk multi API to do multiple operations.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12591477/YARN-353.5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1439//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1439//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591477/YARN-353.5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1439//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1439//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Few comments/nits before we get this in:

          1. Add test to check the disconnect and reconnect logic, essentially the switch-cases in processWatchEvent
          2. YarnConfiguration: how about creating a common prefix for all of zk-state-store related parameters?
          3. Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3.
          4. ZKRMStateStore#getNewZooKeeper need not be synchronized
          5. Might be cleaner to move zkDoWithRetries to ZkAction
          6. Just thinking out loud: would it make sense to move the *WithRetries methods to ZkAction itself, and may be rename it if required. All of the methods seem like util methods and moving them out might make it more readable
          Show
          Karthik Kambatla added a comment - Few comments/nits before we get this in: Add test to check the disconnect and reconnect logic, essentially the switch-cases in processWatchEvent YarnConfiguration: how about creating a common prefix for all of zk-state-store related parameters? Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3. ZKRMStateStore#getNewZooKeeper need not be synchronized Might be cleaner to move zkDoWithRetries to ZkAction Just thinking out loud: would it make sense to move the *WithRetries methods to ZkAction itself, and may be rename it if required. All of the methods seem like util methods and moving them out might make it more readable
          Hide
          Jian He added a comment -

          Thanks for the review, Karthik

          YarnConfiguration: how about creating a common prefix for all of zk-state-store related parameters?

          Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3.

          ZKRMStateStore#getNewZooKeeper need not be synchronized

          fixed

          Might be cleaner to move zkDoWithRetries to ZkAction

          we can implement no-retry functionalities with ZkAction if separate zkDoWithRetries out of ZkAction. same reason for 6

          New patch also added test case for ZKClient disconnect and reconnect logic.

          Show
          Jian He added a comment - Thanks for the review, Karthik YarnConfiguration: how about creating a common prefix for all of zk-state-store related parameters? Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3. ZKRMStateStore#getNewZooKeeper need not be synchronized fixed Might be cleaner to move zkDoWithRetries to ZkAction we can implement no-retry functionalities with ZkAction if separate zkDoWithRetries out of ZkAction. same reason for 6 New patch also added test case for ZKClient disconnect and reconnect logic.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12592363/YARN-353.6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1482//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1482//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1482//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592363/YARN-353.6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1482//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1482//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1482//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12592413/YARN-353.7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1484//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1484//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1484//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592413/YARN-353.7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1484//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1484//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1484//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Jian He, the patch still seems to have NUM_RETRIES. Also, can you take a look at the test failure and findbugs warnings. Thanks.

          Show
          Karthik Kambatla added a comment - Jian He , the patch still seems to have NUM_RETRIES. Also, can you take a look at the test failure and findbugs warnings. Thanks.
          Hide
          Bikas Saha added a comment -

          ZKRMStateStore#getNewZooKeeper need not be synchronized

          fixed

          The code is derived from ActiveStandyLeaderElector code in hadoop common. It was synchronized there for a race condition that showed up in testing. I would like to keep the synchronization as it was in the original patch.

          the patch still seems to have NUM_RETRIES

          Why should NUM_RETRIES not be there?

          Show
          Bikas Saha added a comment - ZKRMStateStore#getNewZooKeeper need not be synchronized fixed The code is derived from ActiveStandyLeaderElector code in hadoop common. It was synchronized there for a race condition that showed up in testing. I would like to keep the synchronization as it was in the original patch. the patch still seems to have NUM_RETRIES Why should NUM_RETRIES not be there?
          Hide
          Karthik Kambatla added a comment -

          Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3.

          fixed

          Why should NUM_RETRIES not be there?

          Was just noting that: the latest patch has the non-configurable NUM_RETRIES, it should exist but be configurable. If it is configurable, we should probably change the name of the variable.

          Show
          Karthik Kambatla added a comment - Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3. fixed Why should NUM_RETRIES not be there? Was just noting that: the latest patch has the non-configurable NUM_RETRIES, it should exist but be configurable. If it is configurable, we should probably change the name of the variable.
          Hide
          Jian He added a comment -

          New patch made NUM_RETRIES configurable.
          Changed removeApplicationState to use multi api to remove both app state and attempts state at the same time. Also fixed the warnings.

          Show
          Jian He added a comment - New patch made NUM_RETRIES configurable. Changed removeApplicationState to use multi api to remove both app state and attempts state at the same time. Also fixed the warnings.
          Hide
          Karthik Kambatla added a comment -

          Looks good. +1 pending Jenkins.

          Show
          Karthik Kambatla added a comment - Looks good. +1 pending Jenkins.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12593045/YARN-353.8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1520//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1520//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1520//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593045/YARN-353.8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1520//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1520//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1520//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          For the findbugs warning around NUM_RETRIES, we should probably make it non-static numRetries.

          Show
          Karthik Kambatla added a comment - For the findbugs warning around NUM_RETRIES, we should probably make it non-static numRetries.
          Hide
          Sandy Ryza added a comment -

          The patch looks good to me, other than the findbugs warning and

          +    LOG
          +      .info("Removing RMDelegationToken_" + rmDTIdentifier.getSequenceNumber());
          

          which should be broken up to have LOG and .info on the same line.

          Have any manual steps been performed to verify that this works on a cluster?

          Show
          Sandy Ryza added a comment - The patch looks good to me, other than the findbugs warning and + LOG + .info( "Removing RMDelegationToken_" + rmDTIdentifier.getSequenceNumber()); which should be broken up to have LOG and .info on the same line. Have any manual steps been performed to verify that this works on a cluster?
          Hide
          Jian He added a comment -

          Uploaded a patch to fix comments and findbugs

          Show
          Jian He added a comment - Uploaded a patch to fix comments and findbugs
          Hide
          Jian He added a comment -

          Have any manual steps been performed to verify that this works on a cluster?

          Tested on real cluster and works fine.

          Show
          Jian He added a comment - Have any manual steps been performed to verify that this works on a cluster? Tested on real cluster and works fine.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12595519/YARN-353.9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1644//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1644//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1644//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595519/YARN-353.9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1644//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1644//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1644//console This message is automatically generated.
          Hide
          Hitesh Shah added a comment -

          Jian He Could you please look at the findbug warnings raised by the patch?

          Show
          Hitesh Shah added a comment - Jian He Could you please look at the findbug warnings raised by the patch?
          Hide
          Hitesh Shah added a comment -

          First set of comments:

          Minor style nits:

          • Lines exceeding 80 chars in various places.
          • indentation issues in yarn-default.xml
          • ZK_STATE_STORE_PREFIX could be just RM_PREFIX + "zk.state-store." instead of "rm-state-store"
          • would it be preferable to not have a default value for the zk acl and enforce the user to set it if zk state store is being used? Obviously, any additional configuration headache but safer in forcing users to set it explicitly.
          • why are some settings commented out in yarn-default.xml and some aren't? Also, the documented defaults do not seem to match the default values used in code.
          • from hadoop-yarn-server-resourcemanager/pom.xml, versions of zookeeper should not be defined here. Use dependency management section defined in hadoop-project/pom.xml to have a consistent version across all components in hadoop.

          ZKRMStateStore comments:

          • "throw new YarnRuntimeException("No ZK_RM_STATE_STORE_ADDRESS specified");" is not useful as it does not clarify the actual property name that was not specified.
          • How is using a default of "world:anyone:rwcda" different from the fallback usage to the acls defined by "Ids.OPEN_ACL_UNSAFE"? If an invalid acl is defined, does it get treated as an error or fallback to open acls? Unit test for this maybe?
          • have not looked at the zk apis in detail, but why are we setting a watch using exists() before calling delete()? Also, return value of exists() is being ignored and delete called regardless?
          • should there be some versioning info that should be stored with the data for future versions to understand how to deserialize data?
          • even though we are using PB which will handle addition of new fields, it might be interesting to understand what version of the software wrote the data and use that info to understand whether some form of defaults need to be augmented or translations to be done, etc. To summarize, do we require a version id in the payload?
          • Is there any area that you see which may create problems when handling either upgrades or downgrades of software?
          • "Unknown child node with name" - should we throw an error instead or ignore unknowns?
          • Is it really necessary to have "LOG.info("Storing info for " to be at the INFO level? DEBUG maybe?
          • the version info from getData() seems to be unused for future setData calls? It seems like setData is only called from the unit tests?
          • rmstatestore and zk store both seem to be catching and logging the same exception before throwing it back down the chain. e.g Error storing info for attempt and Error storing appAttempt
          Show
          Hitesh Shah added a comment - First set of comments: Minor style nits: Lines exceeding 80 chars in various places. indentation issues in yarn-default.xml ZK_STATE_STORE_PREFIX could be just RM_PREFIX + "zk.state-store." instead of "rm-state-store" would it be preferable to not have a default value for the zk acl and enforce the user to set it if zk state store is being used? Obviously, any additional configuration headache but safer in forcing users to set it explicitly. why are some settings commented out in yarn-default.xml and some aren't? Also, the documented defaults do not seem to match the default values used in code. from hadoop-yarn-server-resourcemanager/pom.xml, versions of zookeeper should not be defined here. Use dependency management section defined in hadoop-project/pom.xml to have a consistent version across all components in hadoop. ZKRMStateStore comments: "throw new YarnRuntimeException("No ZK_RM_STATE_STORE_ADDRESS specified");" is not useful as it does not clarify the actual property name that was not specified. How is using a default of "world:anyone:rwcda" different from the fallback usage to the acls defined by "Ids.OPEN_ACL_UNSAFE"? If an invalid acl is defined, does it get treated as an error or fallback to open acls? Unit test for this maybe? have not looked at the zk apis in detail, but why are we setting a watch using exists() before calling delete()? Also, return value of exists() is being ignored and delete called regardless? should there be some versioning info that should be stored with the data for future versions to understand how to deserialize data? even though we are using PB which will handle addition of new fields, it might be interesting to understand what version of the software wrote the data and use that info to understand whether some form of defaults need to be augmented or translations to be done, etc. To summarize, do we require a version id in the payload? Is there any area that you see which may create problems when handling either upgrades or downgrades of software? "Unknown child node with name" - should we throw an error instead or ignore unknowns? Is it really necessary to have "LOG.info("Storing info for " to be at the INFO level? DEBUG maybe? the version info from getData() seems to be unused for future setData calls? It seems like setData is only called from the unit tests? rmstatestore and zk store both seem to be catching and logging the same exception before throwing it back down the chain. e.g Error storing info for attempt and Error storing appAttempt
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12595788/YARN-353.10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1655//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1655//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1655//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595788/YARN-353.10.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1655//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1655//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1655//console This message is automatically generated.
          Hide
          Jian He added a comment -

          Lines exceeding 80 chars in various places.

          indentation issues in yarn-default.xml

          from hadoop-yarn-server-resourcemanager/pom.xml, versions of zookeeper should not be defined here.

          fixed

          would it be preferable to not have a default value for the zk acl and enforce the user to set it if zk state store is being used? Obviously, any additional configuration headache but safer in forcing users to set it explicitly.

          the only user now reads and writes to zk state store is RM itself. in some sense we may trust RM. As you said, this may create more burden on user to configure.

          why are some settings commented out in yarn-default.xml and some aren't? Also, the documented defaults do not seem to match the default values used in code.

          Fixed. except yarn.resourcemanager.zk.state-store.address, which is expected for user to configure explicitly.

          "throw new YarnRuntimeException("No ZK_RM_STATE_STORE_ADDRESS specified");" is not useful as it does not clarify the actual property name that was not specified.

          fixed

          How is using a default of "world:anyone:rwcda" different from the fallback usage to the acls defined by "Ids.OPEN_ACL_UNSAFE"? If an invalid acl is defined, does it get treated as an error or fallback to open acls? Unit test for this maybe?

          added a test case to test invalid acl and should catch exception.

          have not looked at the zk apis in detail, but why are we setting a watch using exists() before calling delete()? Also, return value of exists() is being ignored and delete called regardless?

          the exists() call is only for the purpose of setting a watch, so that the following delete call will trigger the watch. In zookeeper , watch is one-time trigger. if the watch is already triggered, the watch event will not be sent unless the client sets the watch again

          should there be some versioning info that should be stored with the data for future versions to understand how to deserialize data? To summarize, do we require a version id in the payload?

          For now, each app or attempt info only have one version

          Is there any area that you see which may create problems when handling either upgrades or downgrades of software?

          may not support downgrade..

          "Unknown child node with name" - should we throw an error instead or ignore unknowns?

          In case other clients write un-related data under this path, maybe we don't want to crash RM because of this.

          Is it really necessary to have "LOG.info("Storing info for " to be at the INFO level? DEBUG maybe?

          Change the store* and remove* methods to use debug

          the version info from getData() seems to be unused for future setData calls? It seems like setData is only called from the unit tests?

          You mean the stat info ? yes, it's of no use now. setData is only used in unit tests.

          rmstatestore and zk store both seem to be catching and logging the same exception before throwing it back down the chain. e.g Error storing info for attempt and Error storing appAttempt

          fixed

          Show
          Jian He added a comment - Lines exceeding 80 chars in various places. indentation issues in yarn-default.xml from hadoop-yarn-server-resourcemanager/pom.xml, versions of zookeeper should not be defined here. fixed would it be preferable to not have a default value for the zk acl and enforce the user to set it if zk state store is being used? Obviously, any additional configuration headache but safer in forcing users to set it explicitly. the only user now reads and writes to zk state store is RM itself. in some sense we may trust RM. As you said, this may create more burden on user to configure. why are some settings commented out in yarn-default.xml and some aren't? Also, the documented defaults do not seem to match the default values used in code. Fixed. except yarn.resourcemanager.zk.state-store.address, which is expected for user to configure explicitly. "throw new YarnRuntimeException("No ZK_RM_STATE_STORE_ADDRESS specified");" is not useful as it does not clarify the actual property name that was not specified. fixed How is using a default of "world:anyone:rwcda" different from the fallback usage to the acls defined by "Ids.OPEN_ACL_UNSAFE"? If an invalid acl is defined, does it get treated as an error or fallback to open acls? Unit test for this maybe? added a test case to test invalid acl and should catch exception. have not looked at the zk apis in detail, but why are we setting a watch using exists() before calling delete()? Also, return value of exists() is being ignored and delete called regardless? the exists() call is only for the purpose of setting a watch, so that the following delete call will trigger the watch. In zookeeper , watch is one-time trigger. if the watch is already triggered, the watch event will not be sent unless the client sets the watch again should there be some versioning info that should be stored with the data for future versions to understand how to deserialize data? To summarize, do we require a version id in the payload? For now, each app or attempt info only have one version Is there any area that you see which may create problems when handling either upgrades or downgrades of software? may not support downgrade.. "Unknown child node with name" - should we throw an error instead or ignore unknowns? In case other clients write un-related data under this path, maybe we don't want to crash RM because of this. Is it really necessary to have "LOG.info("Storing info for " to be at the INFO level? DEBUG maybe? Change the store* and remove* methods to use debug the version info from getData() seems to be unused for future setData calls? It seems like setData is only called from the unit tests? You mean the stat info ? yes, it's of no use now. setData is only used in unit tests. rmstatestore and zk store both seem to be catching and logging the same exception before throwing it back down the chain. e.g Error storing info for attempt and Error storing appAttempt fixed
          Hide
          Jian He added a comment -

          The findbug -1 might be related to the test case is non-synchronously referencing 'zkClient' and 'zkSessionTimeout', which may not be an issue.

          Show
          Jian He added a comment - The findbug -1 might be related to the test case is non-synchronously referencing 'zkClient' and 'zkSessionTimeout', which may not be an issue.
          Hide
          Karthik Kambatla added a comment -

          Manually inspected the fields findbugs is complaining about - don't see any particular issues or additional need for synchronization.

          Uploading patch that adds exclusions for the two fields in question.

          Show
          Karthik Kambatla added a comment - Manually inspected the fields findbugs is complaining about - don't see any particular issues or additional need for synchronization. Uploading patch that adds exclusions for the two fields in question.
          Hide
          Karthik Kambatla added a comment -

          YARN-353.11.patch is the patch with findbugs exclusions.

          Show
          Karthik Kambatla added a comment - YARN-353 .11.patch is the patch with findbugs exclusions.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12596465/YARN-353.11.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1665//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1665//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596465/YARN-353.11.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1665//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1665//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Hitesh Shah, will you be able to take a look at the latest patch and see if it addresses all your comments.

          Show
          Karthik Kambatla added a comment - Hitesh Shah , will you be able to take a look at the latest patch and see if it addresses all your comments.
          Hide
          Hitesh Shah added a comment -

          Comments:

          Identation still off in yarn-default.xml

          Not sure if a default value of "<!-value>127.0.0.1:2181</value->" should be mentioned in yarn-default.xml.

          hadoop-yarn-server-resourcemanager/pom.xml is still not fixed - contains the version info.

          LOG.debug() should be wrapped within "if LOG.isDebugEnabled() {}"

          In "abstract class ZKAction<T>", member variables should be final.

          findbugs issue - zkSessionTimeout does not seem to be an issue but zkClient is something which is read/modified multiple times in various functions. Which functions are the ones that are/cannot be synchronized that access zkClient?

          +    if (zkHostPort == null) {
          +      throw new YarnRuntimeException(
          +        "No server address specified for zookeeper state store for Resource" +
          +        " Manager recovery. ZK_RM_STATE_STORE_ADDRESS is not configured.");
          
          • validation only for non-null and not a valid format?
          • ZK_RM_STATE_STORE_ADDRESS as a string?

          A question on general usage of zk: why is everything being stored at the top level of the tree and not a heirarchical structure i.e. attempts of a particular application stored under that application's dir?

          +    } catch (Exception e) {
          +      throw e;
          +    }
          
          • why the catch and re-throw? ( in multiple places )

          "os.close();" - close calls should be in a finally block ( in multiple places ).

          "LOG.info("Error in storing " + dtSequenceNumberPath);" - change to LOG.info("Error in storing " + dtSequenceNumberPath, e);"

          +  String getNodePath(String root, String nodeName) {
          +    return (root + "/" + nodeName);
          +  }
          
          • does ZK have a variable to define the node path separator which we can use instead of a magic string?

          "LOG.info("Created new connection for " + this);" --> logging 'this'? is there a toString() function?

          How is a connection failure to zk handled? i.e. getNewZooKeeper() throws an exception. Does the RM fail/shutdown? Is the connection retried at a later point?

          When creating a Zookeeper object, ZK apis support a base root path and all operations are done relative to the base root path? Any reason why we are not using that approach by initializing zk with zkRootNodePath ?

          +    try {
          +      createWithRetries(znodeWorkingPath, null, zkAcl, CreateMode.PERSISTENT);
          +      createWithRetries(zkRootNodePath, null, zkAcl, CreateMode.PERSISTENT);
          +      createWithRetries(rmDTSecretManagerRoot, null, zkAcl,
          +        CreateMode.PERSISTENT);
          +      createWithRetries(rmAppRoot, null, zkAcl, CreateMode.PERSISTENT);
          +    } catch (KeeperException ke) {
          +      if (ke.code() != Code.NODEEXISTS) {
          +        throw ke;
          +      }
          +    }
          
          • what if the first line throws an exception saying node exists but the other nodes are not created? Shouldn't each call be in its own try catch block? Or should the create function be changed to accept a parameter which when set causes the function to ignore node exists errors?

          For deleteWithRetries, the return code of exists() could be checked if a delete is required or not.

          Still need to look at the unit tests.

          Show
          Hitesh Shah added a comment - Comments: Identation still off in yarn-default.xml Not sure if a default value of "<!- value>127.0.0.1:2181</value ->" should be mentioned in yarn-default.xml. hadoop-yarn-server-resourcemanager/pom.xml is still not fixed - contains the version info. LOG.debug() should be wrapped within "if LOG.isDebugEnabled() {}" In "abstract class ZKAction<T>", member variables should be final. findbugs issue - zkSessionTimeout does not seem to be an issue but zkClient is something which is read/modified multiple times in various functions. Which functions are the ones that are/cannot be synchronized that access zkClient? + if (zkHostPort == null ) { + throw new YarnRuntimeException( + "No server address specified for zookeeper state store for Resource" + + " Manager recovery. ZK_RM_STATE_STORE_ADDRESS is not configured." ); validation only for non-null and not a valid format? ZK_RM_STATE_STORE_ADDRESS as a string? A question on general usage of zk: why is everything being stored at the top level of the tree and not a heirarchical structure i.e. attempts of a particular application stored under that application's dir? + } catch (Exception e) { + throw e; + } why the catch and re-throw? ( in multiple places ) "os.close();" - close calls should be in a finally block ( in multiple places ). "LOG.info("Error in storing " + dtSequenceNumberPath);" - change to LOG.info("Error in storing " + dtSequenceNumberPath, e);" + String getNodePath( String root, String nodeName) { + return (root + "/" + nodeName); + } does ZK have a variable to define the node path separator which we can use instead of a magic string? "LOG.info("Created new connection for " + this);" --> logging 'this'? is there a toString() function? How is a connection failure to zk handled? i.e. getNewZooKeeper() throws an exception. Does the RM fail/shutdown? Is the connection retried at a later point? When creating a Zookeeper object, ZK apis support a base root path and all operations are done relative to the base root path? Any reason why we are not using that approach by initializing zk with zkRootNodePath ? + try { + createWithRetries(znodeWorkingPath, null , zkAcl, CreateMode.PERSISTENT); + createWithRetries(zkRootNodePath, null , zkAcl, CreateMode.PERSISTENT); + createWithRetries(rmDTSecretManagerRoot, null , zkAcl, + CreateMode.PERSISTENT); + createWithRetries(rmAppRoot, null , zkAcl, CreateMode.PERSISTENT); + } catch (KeeperException ke) { + if (ke.code() != Code.NODEEXISTS) { + throw ke; + } + } what if the first line throws an exception saying node exists but the other nodes are not created? Shouldn't each call be in its own try catch block? Or should the create function be changed to accept a parameter which when set causes the function to ignore node exists errors? For deleteWithRetries, the return code of exists() could be checked if a delete is required or not. Still need to look at the unit tests.
          Hide
          Karthik Kambatla added a comment -

          Jian He/Bikas Saha, will you be able to address Hitesh's comments? If you are busy and can't get to it, I ll be happy to address them.

          Show
          Karthik Kambatla added a comment - Jian He / Bikas Saha , will you be able to address Hitesh's comments? If you are busy and can't get to it, I ll be happy to address them.
          Hide
          Jian He added a comment -

          Go ahead, thanks for that, Karthik !

          Show
          Jian He added a comment - Go ahead, thanks for that, Karthik !
          Hide
          Jian He added a comment -

          why the catch and re-throw?

          The intention was leaving there for future exception handling(HA etc). could be removed.

          why is everything being stored at the top level of the tree and not a heirarchical structure i.e. attempts of a particular application stored under that application's dir?

          ZK doesn't support directory

          How is a connection failure to zk handled? i.e. getNewZooKeeper() throws an exception. Does the RM fail/shutdown? Is the connection retried at a later point?

          If the connection is lost after RM already connected with ZK, a Disconnected event will be sent, zkClient sets to null, all operations(create/delete etc.) will wait for zkClient to be set again (verified this in cluster by stopping ZK during the connection). But if getNewZooKeeper() itself throws an exception, for now RM will fail and will not retry

          When creating a Zookeeper object, ZK apis support a base root path and all operations are done relative to the base root path? Any reason why we are not using that approach by initializing zk with zkRootNodePath ?

          didn't know this.

          For deleteWithRetries, the return code of exists() could be checked if a delete is required or not.

          this depends on whether RM wants to know the delete operation succeeds or not.

          Show
          Jian He added a comment - why the catch and re-throw? The intention was leaving there for future exception handling(HA etc). could be removed. why is everything being stored at the top level of the tree and not a heirarchical structure i.e. attempts of a particular application stored under that application's dir? ZK doesn't support directory How is a connection failure to zk handled? i.e. getNewZooKeeper() throws an exception. Does the RM fail/shutdown? Is the connection retried at a later point? If the connection is lost after RM already connected with ZK, a Disconnected event will be sent, zkClient sets to null, all operations(create/delete etc.) will wait for zkClient to be set again (verified this in cluster by stopping ZK during the connection). But if getNewZooKeeper() itself throws an exception, for now RM will fail and will not retry When creating a Zookeeper object, ZK apis support a base root path and all operations are done relative to the base root path? Any reason why we are not using that approach by initializing zk with zkRootNodePath ? didn't know this. For deleteWithRetries, the return code of exists() could be checked if a delete is required or not. this depends on whether RM wants to know the delete operation succeeds or not.
          Hide
          Hitesh Shah added a comment -

          For deleteWithRetries, the return code of exists() could be checked if a delete is required or not. this depends on whether RM wants to know the delete operation succeeds or not.

          I am not sure I understand. If the RM is trying to delete something and the node does not exist, is there a situation where the RM wants to know that the node didn't exist and fail if a non-existent node was tried to be deleted?

          Show
          Hitesh Shah added a comment - For deleteWithRetries, the return code of exists() could be checked if a delete is required or not. this depends on whether RM wants to know the delete operation succeeds or not. I am not sure I understand. If the RM is trying to delete something and the node does not exist, is there a situation where the RM wants to know that the node didn't exist and fail if a non-existent node was tried to be deleted?
          Hide
          Jian He added a comment -

          I am not sure I understand. If the RM is trying to delete something and the node does not exist, is there a situation where the RM wants to know that the node didn't exist and fail if a non-existent node was tried to be deleted?

          Agreed. We should specifically check if the node exists or not. Otherwise the ZK delete() API will throw an exception if node doesn't exist which we don't want.

          Show
          Jian He added a comment - I am not sure I understand. If the RM is trying to delete something and the node does not exist, is there a situation where the RM wants to know that the node didn't exist and fail if a non-existent node was tried to be deleted? Agreed. We should specifically check if the node exists or not. Otherwise the ZK delete() API will throw an exception if node doesn't exist which we don't want.
          Hide
          Karthik Kambatla added a comment -

          Looking into this now. Will hopefully have an update (patch + replies) sometime today.

          Show
          Karthik Kambatla added a comment - Looking into this now. Will hopefully have an update (patch + replies) sometime today.
          Hide
          Karthik Kambatla added a comment -

          While looking into the findbugs issue, noticed that there was a potential bug in ZKAction implementation.

          In YARN-353.11.patch and other previous patches, ZKAction has a local copy of zkClient. If this zkClient is null, it waits for createConnection() to connect. However, this local copy is not updated, making the wait-notify between ZKAction#runWithCheck and createConnection moot.

          Posting a wip patch (haven't tested it on a cluster yet) that

          1. gets rid of the local variables in ZKAction
          2. adds synchronization around the wait in ZKAction#runWithCheck()
          3. moves zkDoWithRetries to ZKAction#runWithRetries

          With this patch, there was no need to exclude findbugs warnings. Verified TestRMRestart passes.

          Show
          Karthik Kambatla added a comment - While looking into the findbugs issue, noticed that there was a potential bug in ZKAction implementation. In YARN-353 .11.patch and other previous patches, ZKAction has a local copy of zkClient. If this zkClient is null, it waits for createConnection() to connect. However, this local copy is not updated, making the wait-notify between ZKAction#runWithCheck and createConnection moot. Posting a wip patch (haven't tested it on a cluster yet) that gets rid of the local variables in ZKAction adds synchronization around the wait in ZKAction#runWithCheck() moves zkDoWithRetries to ZKAction#runWithRetries With this patch, there was no need to exclude findbugs warnings. Verified TestRMRestart passes.
          Hide
          Karthik Kambatla added a comment -

          Uploading patch that addresses Hitesh's comments.

          Show
          Karthik Kambatla added a comment - Uploading patch that addresses Hitesh's comments.
          Hide
          Karthik Kambatla added a comment -

          Identation still off in yarn-default.xml

          hadoop-yarn-server-resourcemanager/pom.xml is still not fixed - contains the version info.

          LOG.debug() should be wrapped within "if LOG.isDebugEnabled() {}"

          In "abstract class ZKAction<T>", member variables should be final.

          findbugs issue - zkSessionTimeout does not seem to be an issue but zkClient is something which is read/modified multiple times in various functions. Which functions are the ones that are/cannot be synchronized that access zkClient?

          "os.close();" - close calls should be in a finally block ( in multiple places ).

          "LOG.info("Error in storing " + dtSequenceNumberPath);" - change to LOG.info("Error in storing " + dtSequenceNumberPath, e);"

          "LOG.info("Created new connection for " + this);" --> logging 'this'? is there a toString() function?

          How is a connection failure to zk handled? i.e. getNewZooKeeper() throws an exception. Does the RM fail/shutdown? Is the connection retried at a later point?

          what if the first line throws an exception saying node exists but the other nodes are not created? Shouldn't each call be in its own try catch block? Or should the create function be changed to accept a parameter which when set causes the function to ignore node exists errors?

          For deleteWithRetries, the return code of exists() could be checked if a delete is required or not.

          Fixed.

          When creating a Zookeeper object, ZK apis support a base root path and all operations are done relative to the base root path? Any reason why we are not using that approach by initializing zk with zkRootNodePath ?

          We are calling createWithRetries on multiple roots

          why the catch and re-throw? ( in multiple places )

          The HA code might want to deal with those scenarios differently. Given we are on the verge of implement ZK-based HA, I think it is okay to leave these as they currently are.

          validation only for non-null and not a valid format?

          While it is a good idea to check for a valid format, the current behavior is how RM etc. deal with host:port arguments

          Not sure if a default value of "<!value>127.0.0.1:2181</value>" should be mentioned in yarn-default.xml.

          Changed the comment to be host:port. Leaving it there makes it easier for users.

          Show
          Karthik Kambatla added a comment - Identation still off in yarn-default.xml hadoop-yarn-server-resourcemanager/pom.xml is still not fixed - contains the version info. LOG.debug() should be wrapped within "if LOG.isDebugEnabled() {}" In "abstract class ZKAction<T>", member variables should be final. findbugs issue - zkSessionTimeout does not seem to be an issue but zkClient is something which is read/modified multiple times in various functions. Which functions are the ones that are/cannot be synchronized that access zkClient? "os.close();" - close calls should be in a finally block ( in multiple places ). "LOG.info("Error in storing " + dtSequenceNumberPath);" - change to LOG.info("Error in storing " + dtSequenceNumberPath, e);" "LOG.info("Created new connection for " + this);" --> logging 'this'? is there a toString() function? How is a connection failure to zk handled? i.e. getNewZooKeeper() throws an exception. Does the RM fail/shutdown? Is the connection retried at a later point? what if the first line throws an exception saying node exists but the other nodes are not created? Shouldn't each call be in its own try catch block? Or should the create function be changed to accept a parameter which when set causes the function to ignore node exists errors? For deleteWithRetries, the return code of exists() could be checked if a delete is required or not. Fixed. When creating a Zookeeper object, ZK apis support a base root path and all operations are done relative to the base root path? Any reason why we are not using that approach by initializing zk with zkRootNodePath ? We are calling createWithRetries on multiple roots why the catch and re-throw? ( in multiple places ) The HA code might want to deal with those scenarios differently. Given we are on the verge of implement ZK-based HA, I think it is okay to leave these as they currently are. validation only for non-null and not a valid format? While it is a good idea to check for a valid format, the current behavior is how RM etc. deal with host:port arguments Not sure if a default value of "<! value>127.0.0.1:2181</value >" should be mentioned in yarn-default.xml. Changed the comment to be host:port. Leaving it there makes it easier for users.
          Hide
          Karthik Kambatla added a comment -

          Kicking off Jenkins to see if it complains of anything. Yet to test the ZKRMStateStore on a cluster.

          Show
          Karthik Kambatla added a comment - Kicking off Jenkins to see if it complains of anything. Yet to test the ZKRMStateStore on a cluster.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12597034/YARN-353.12.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1681//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1681//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1681//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597034/YARN-353.12.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1681//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1681//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1681//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Not quite sure why we see the findbugs warning, it is in files that I haven't touched. Filed HDFS-5082 to handle remove zookeeper version info from hadoop-hdfs/pom.xml.

          Show
          Karthik Kambatla added a comment - Not quite sure why we see the findbugs warning, it is in files that I haven't touched. Filed HDFS-5082 to handle remove zookeeper version info from hadoop-hdfs/pom.xml.
          Hide
          Jian He added a comment -

          Looks good overall, few nits

          for (int retries = 0; retries < numRetries && zkClient == null;
                  retries++) {
                zkClient = getNewZooKeeper();
              }
          

          If getNewZooKeeper throws an Exception, it will break out of the loop, instead of retry again ? is it necessary to put a try/catch block around it ?

          We are calling createWithRetries on multiple roots

          'zkRootNodePath' is in fact the common root for 'rmDTSecretManagerRoot' and 'rmAppRoot', or we can at least new ZooKeeper along with'znodeWorkingPath'?

          I missed one change in my earlier patch, change 'yarn.resourcemanager.zk.rm-state-store.address' to 'yarn.resourcemanager.zk.state-store.address', can you fix this, thanks.

          Show
          Jian He added a comment - Looks good overall, few nits for ( int retries = 0; retries < numRetries && zkClient == null ; retries++) { zkClient = getNewZooKeeper(); } If getNewZooKeeper throws an Exception, it will break out of the loop, instead of retry again ? is it necessary to put a try/catch block around it ? We are calling createWithRetries on multiple roots 'zkRootNodePath' is in fact the common root for 'rmDTSecretManagerRoot' and 'rmAppRoot', or we can at least new ZooKeeper along with'znodeWorkingPath'? I missed one change in my earlier patch, change 'yarn.resourcemanager.zk.rm-state-store.address' to 'yarn.resourcemanager.zk.state-store.address', can you fix this, thanks.
          Hide
          Karthik Kambatla added a comment -

          Patch addressing Jian's latest comments.

          Show
          Karthik Kambatla added a comment - Patch addressing Jian's latest comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12597323/YARN-353.13.patch
          against trunk revision .

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1691//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597323/YARN-353.13.patch against trunk revision . -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1691//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Applied YARN-353.13.patch on trunk and was able to run TestRMRestart successfully. Not sure why Jenkins was unable to build.

          Re-posting the same patch.

          Show
          Karthik Kambatla added a comment - Applied YARN-353 .13.patch on trunk and was able to run TestRMRestart successfully. Not sure why Jenkins was unable to build. Re-posting the same patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12597326/YARN-353.13.patch
          against trunk revision .

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1692//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597326/YARN-353.13.patch against trunk revision . -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1692//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Tested the latest patch on pseudo-dist cluster and ran into the same problem as YARN-1058. The patch behaves very similar to the FileSystemRMStateStore implementation.

          Show
          Karthik Kambatla added a comment - Tested the latest patch on pseudo-dist cluster and ran into the same problem as YARN-1058 . The patch behaves very similar to the FileSystemRMStateStore implementation.
          Hide
          Karthik Kambatla added a comment -

          Assigned to myself for easier tracking. The patch is primarily implemented by Bikas.

          Show
          Karthik Kambatla added a comment - Assigned to myself for easier tracking. The patch is primarily implemented by Bikas.
          Hide
          Karthik Kambatla added a comment -

          Re-submitting patch now that the Jenkins (protobuf) issue has been fixed. While this still suffers from YARN-1058, we can may be work on the fix there?

          Show
          Karthik Kambatla added a comment - Re-submitting patch now that the Jenkins (protobuf) issue has been fixed. While this still suffers from YARN-1058 , we can may be work on the fix there?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12597420/YARN-353.13.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1699//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1699//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1699//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597420/YARN-353.13.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1699//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1699//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1699//console This message is automatically generated.
          Hide
          Hitesh Shah added a comment -

          Sorry for the delay in the review. Been sidetracked by other work.

          Comments:

          FS State store uses "fs.rm-state-store" where as ZK uses "zk.state-store" - this is inconsistent.
          Also, the variable names seem a bit inconsistent - should they be RM_ZK_STATE_STORE* as compared to ZK_RM_STATE_STORE* to match the actual property names? Though the property name itself has RM defined twice instead of just once.

          Use of "This must be supplied when using org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore" in yarn-default.xml - is the whole class name required? If yes, there are 1-2 properties which do not use the full class name.

          LOG.debug statement not encapsulated within if isDebugEnabled() in RMStateStore.java.

          +    } catch (Exception e) {
          +      throw e;
          +    }
          
          • Could you add comments so that this piece of code is removed when HA handling work is done.
          +      try {
          +        zkClient = getNewZooKeeper();
          +      } catch (Exception e) {
          +        if (LOG.isDebugEnabled()) {
          +          LOG.debug("Failed to connect to the ZooKeeper on attempt - " +
          +              (retries + 1));
          +        }
          +      }
          
          • Why are all exceptions being caught instead of an explicit set? If there is a good reason for all exceptions, why not catch Throwable to capture everything?
          • how is a failure in closing of zk connections meant to be handled in the createConnection function?
          • The reason for why a connection to ZK failed is never logged. The only message logged is failed to connect on attempt X and that too only in debug level.

          It seems like "throws Exception" is being used in most places as compared to known types?

          The tests for basic zk connect/CRUD probably belong in a separate file.

          +      Assert.assertTrue(e.getMessage().contains(
          +        "not of expected form scheme:id:perm"));
          
          • where is this message being produced? is the RM code validating the format or ZK, and if ZK, should we be testing this in the first place? Assuming the test is to validate that we are using the configured value properly, how about a test for a diff perm from the default and checking that the zkNode has that permission set.
          Show
          Hitesh Shah added a comment - Sorry for the delay in the review. Been sidetracked by other work. Comments: FS State store uses "fs.rm-state-store" where as ZK uses "zk.state-store" - this is inconsistent. Also, the variable names seem a bit inconsistent - should they be RM_ZK_STATE_STORE* as compared to ZK_RM_STATE_STORE* to match the actual property names? Though the property name itself has RM defined twice instead of just once. Use of "This must be supplied when using org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore" in yarn-default.xml - is the whole class name required? If yes, there are 1-2 properties which do not use the full class name. LOG.debug statement not encapsulated within if isDebugEnabled() in RMStateStore.java. + } catch (Exception e) { + throw e; + } Could you add comments so that this piece of code is removed when HA handling work is done. + try { + zkClient = getNewZooKeeper(); + } catch (Exception e) { + if (LOG.isDebugEnabled()) { + LOG.debug( "Failed to connect to the ZooKeeper on attempt - " + + (retries + 1)); + } + } Why are all exceptions being caught instead of an explicit set? If there is a good reason for all exceptions, why not catch Throwable to capture everything? how is a failure in closing of zk connections meant to be handled in the createConnection function? The reason for why a connection to ZK failed is never logged. The only message logged is failed to connect on attempt X and that too only in debug level. It seems like "throws Exception" is being used in most places as compared to known types? The tests for basic zk connect/CRUD probably belong in a separate file. + Assert.assertTrue(e.getMessage().contains( + "not of expected form scheme:id:perm" )); where is this message being produced? is the RM code validating the format or ZK, and if ZK, should we be testing this in the first place? Assuming the test is to validate that we are using the configured value properly, how about a test for a diff perm from the default and checking that the zkNode has that permission set.
          Hide
          Karthik Kambatla added a comment -

          Thanks for the detailed review, Hitesh Shah.

          YARN-353.14.patch rebases against trunk and addresses most of the comments, but depends on HADOOP-9906.

          FS State store uses "fs.rm-state-store" where as ZK uses "zk.state-store" - this is inconsistent.

          YARN-1056 fixes the FSRMStateStore prefix. Now, they are consistent.

          Also, the variable names seem a bit inconsistent - should they be RM_ZK_STATE_STORE* as compared to ZK_RM_STATE_STORE* to match the actual property names? Though the property name itself has RM defined twice instead of just once.

          The variable names mimic the class names of the RMStateStore implementations. I believe this reads better.

          LOG.debug statement not encapsulated within if isDebugEnabled() in RMStateStore.java.

          Fixed.

          Could you add comments so that this piece of code is removed when HA handling work is done.

          Fixed and filed YARN-1099 for the same.

          Why are all exceptions being caught instead of an explicit set? If there is a good reason for all exceptions, why not catch Throwable to capture everything?

          The reason for why a connection to ZK failed is never logged. The only message logged is failed to connect on attempt X and that too only in debug level.

          Fixed.

          It seems like "throws Exception" is being used in most places as compared to known types?

          Looked through and fixed where possible.

          The tests for basic zk connect/CRUD probably belong in a separate file.

          Moved to a separate file.

          where is this message being produced? is the RM code validating the format or ZK, and if ZK, should we be testing this in the first place? Assuming the test is to validate that we are using the configured value properly, how about a test for a diff perm from the default and checking that the zkNode has that permission set.

          Fixed through HADOOP-9906 and changes in this patch.

          Show
          Karthik Kambatla added a comment - Thanks for the detailed review, Hitesh Shah . YARN-353 .14.patch rebases against trunk and addresses most of the comments, but depends on HADOOP-9906 . FS State store uses "fs.rm-state-store" where as ZK uses "zk.state-store" - this is inconsistent. YARN-1056 fixes the FSRMStateStore prefix. Now, they are consistent. Also, the variable names seem a bit inconsistent - should they be RM_ZK_STATE_STORE* as compared to ZK_RM_STATE_STORE* to match the actual property names? Though the property name itself has RM defined twice instead of just once. The variable names mimic the class names of the RMStateStore implementations. I believe this reads better. LOG.debug statement not encapsulated within if isDebugEnabled() in RMStateStore.java. Fixed. Could you add comments so that this piece of code is removed when HA handling work is done. Fixed and filed YARN-1099 for the same. Why are all exceptions being caught instead of an explicit set? If there is a good reason for all exceptions, why not catch Throwable to capture everything? The reason for why a connection to ZK failed is never logged. The only message logged is failed to connect on attempt X and that too only in debug level. Fixed. It seems like "throws Exception" is being used in most places as compared to known types? Looked through and fixed where possible. The tests for basic zk connect/CRUD probably belong in a separate file. Moved to a separate file. where is this message being produced? is the RM code validating the format or ZK, and if ZK, should we be testing this in the first place? Assuming the test is to validate that we are using the configured value properly, how about a test for a diff perm from the default and checking that the zkNode has that permission set. Fixed through HADOOP-9906 and changes in this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12600068/YARN-353.14.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1773//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12600068/YARN-353.14.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1773//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Patch that fixes yarn-default.xml descriptions.

          Show
          Karthik Kambatla added a comment - Patch that fixes yarn-default.xml descriptions.
          Hide
          Karthik Kambatla added a comment -

          Submitting patch since HADOOP-9906 is committed.

          Show
          Karthik Kambatla added a comment - Submitting patch since HADOOP-9906 is committed.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12600180/YARN-353.15.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1783//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1783//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12600180/YARN-353.15.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1783//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1783//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Hitesh Shah, when you get a chance, can you take a look at the updated patch.

          Show
          Karthik Kambatla added a comment - Hitesh Shah , when you get a chance, can you take a look at the updated patch.
          Hide
          Hitesh Shah added a comment -

          hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml

          • shouldn't there be a dependency on zookeeper even in the normal scope?
          +      if (childNodeName.startsWith(DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX)) {
          +        rmState.rmSecretManagerState.dtSequenceNumber =
          +            Integer.parseInt(childNodeName.split("_")[1]);
          +        continue;
          +      }
          
          • Could you clarify whether there can be multiple child nodes prefixed with DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX in any possible state variation?
          +          // assert child node name is same as actual applicationId
          +          assert appId.equals(appState.context.getApplicationId());
          
          • why the need for an assert? Should this check throw a runtime exception instead? (likewise for other assert checks )
          +    } catch (Exception e) {
          +      // currently throw all exceptions. May need to respond differently for HA
          +      // based on whether we have lost the right to write to ZK
          +      // TODO: Revisit this post YARN-149
          +      throw e;
          +    }
          
          • I believe its better to just remove such code and add it in with HA patches.
          +        /**
          +         * Call exists() to leave a watch on the node denoted by path.
          +         * Delete node if exists. To pass the existence information to the
          +         * caller, call delete irrespective of whether node exists or not.
          +         */
          +        if (zkClient.exists(path, true) == null) {
          +          LOG.error("Trying to delete a path (" + path
          +              + ") that doesn't exist.");
          +        } else {
          +          zkClient.delete(path, version);
          +        }
          
          • code does not match the comment ( with respect to passing of existence information )
          +    } catch (Exception e) {
          +      e.printStackTrace();
          +      Assert.fail("ZKRMStateStore Session restore failed");
          +    }
          
          • Don't think there is any need to catch the exception. The unit test will fail if the exception is not caught. If the exception stack trace in the unit test logs is not useful enough to understand the failure reason, it may be better to fix the code as needed. ( likewise in the other couple of places in the unit test where exceptions are being caught and handled with an assert.fail()
          +    Thread.sleep(800);
          
          • the zk unit test has magic sleeps of 800 ms in some cases, 500 ms in others. What is the reason for these different numbers? Does the test helper need augmenting to remove this timing related dependency?

          General minor nits:

          • 80 chars line limit exceeded in multiple files.
          Show
          Hitesh Shah added a comment - hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml shouldn't there be a dependency on zookeeper even in the normal scope? + if (childNodeName.startsWith(DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX)) { + rmState.rmSecretManagerState.dtSequenceNumber = + Integer .parseInt(childNodeName.split( "_" )[1]); + continue ; + } Could you clarify whether there can be multiple child nodes prefixed with DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX in any possible state variation? + // assert child node name is same as actual applicationId + assert appId.equals(appState.context.getApplicationId()); why the need for an assert? Should this check throw a runtime exception instead? (likewise for other assert checks ) + } catch (Exception e) { + // currently throw all exceptions. May need to respond differently for HA + // based on whether we have lost the right to write to ZK + // TODO: Revisit this post YARN-149 + throw e; + } I believe its better to just remove such code and add it in with HA patches. + /** + * Call exists() to leave a watch on the node denoted by path. + * Delete node if exists. To pass the existence information to the + * caller, call delete irrespective of whether node exists or not. + */ + if (zkClient.exists(path, true ) == null ) { + LOG.error( "Trying to delete a path (" + path + + ") that doesn't exist." ); + } else { + zkClient.delete(path, version); + } code does not match the comment ( with respect to passing of existence information ) + } catch (Exception e) { + e.printStackTrace(); + Assert.fail( "ZKRMStateStore Session restore failed" ); + } Don't think there is any need to catch the exception. The unit test will fail if the exception is not caught. If the exception stack trace in the unit test logs is not useful enough to understand the failure reason, it may be better to fix the code as needed. ( likewise in the other couple of places in the unit test where exceptions are being caught and handled with an assert.fail() + Thread .sleep(800); the zk unit test has magic sleeps of 800 ms in some cases, 500 ms in others. What is the reason for these different numbers? Does the test helper need augmenting to remove this timing related dependency? General minor nits: 80 chars line limit exceeded in multiple files.
          Hide
          Karthik Kambatla added a comment -

          Thanks again for the detailed review, Hitesh Shah. Sorry for the slow turnaround on this.

          } catch (Exception e) {
            e.printStackTrace();
            Assert.fail("ZKRMStateStore Session restore failed");
          }
          

          Don't think there is any need to catch the exception. The unit test will fail if the exception is not caught.

          Agree. The difference is in what the test reports shows this as - Failure or Error. The general rule I try to follow is - errors encountered in the specific code we are testing should be marked failures. I am not particular about this though, let me know if you would like me to make that change as well. The updated patch addresses some of these inconsistencies as well.

          Show
          Karthik Kambatla added a comment - Thanks again for the detailed review, Hitesh Shah . Sorry for the slow turnaround on this. } catch (Exception e) { e.printStackTrace(); Assert.fail( "ZKRMStateStore Session restore failed" ); } Don't think there is any need to catch the exception. The unit test will fail if the exception is not caught. Agree. The difference is in what the test reports shows this as - Failure or Error. The general rule I try to follow is - errors encountered in the specific code we are testing should be marked failures. I am not particular about this though, let me know if you would like me to make that change as well. The updated patch addresses some of these inconsistencies as well.
          Hide
          Karthik Kambatla added a comment -

          The latest patch addresses Hitesh's latest set of review comments. Some of the spurious looking changes in TestRMRestart or to fix lines longer than 80 chars. I tested this on a pseudo-dist cluster:

          1. Started the RM using ZKRMStateStore
          2. Ran jobs - all ZKnodes are created as expected
          3. Restarted the RM several times while a job is running - the job succeeds.
          Show
          Karthik Kambatla added a comment - The latest patch addresses Hitesh's latest set of review comments. Some of the spurious looking changes in TestRMRestart or to fix lines longer than 80 chars. I tested this on a pseudo-dist cluster: Started the RM using ZKRMStateStore Ran jobs - all ZKnodes are created as expected Restarted the RM several times while a job is running - the job succeeds.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12603419/YARN-353.16.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1939//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1939//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12603419/YARN-353.16.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1939//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1939//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Hitesh Shah, can you please take a look when you get a chance? I believe the latest patch addresses all the comments. Thanks.

          Show
          Karthik Kambatla added a comment - Hitesh Shah , can you please take a look when you get a chance? I believe the latest patch addresses all the comments. Thanks.
          Hide
          Hitesh Shah added a comment -

          Karthik Kambatla Thanks for the patience. Looks good. I missed a minor point on my previous comments.

          +    } catch (Exception e) {
          +      LOG.error("Failed to load state.", e);
          +      throw e;
          +    }
          

          I removed the above snippet as it was not adding any useful information and attached the 16.1 patch.

          Let me know if you have any concerns.

          Will wait for jenkins to +1 and commit sometime tomorrow.

          Show
          Hitesh Shah added a comment - Karthik Kambatla Thanks for the patience. Looks good. I missed a minor point on my previous comments. + } catch (Exception e) { + LOG.error( "Failed to load state." , e); + throw e; + } I removed the above snippet as it was not adding any useful information and attached the 16.1 patch. Let me know if you have any concerns. Will wait for jenkins to +1 and commit sometime tomorrow.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12603992/YARN-353.16.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1969//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1969//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12603992/YARN-353.16.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1969//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1969//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Thanks Hitesh. Dropping the try-catch blocks seems reasonable to me.

          Show
          Karthik Kambatla added a comment - Thanks Hitesh. Dropping the try-catch blocks seems reasonable to me.
          Hide
          Bikas Saha added a comment -

          We should be storing the information inside the data of the znode instead of encoding it in the name of the znode. The rename pattern was followed as an optimization in FSStore but is not necessary in ZKStore give set-with-overwrite operations. Its also better to avoid encoding the info in the name since doing "ls" in ZK is more relaxed wrt permissions than it is in FileSystem.

          +      if (childNodeName.startsWith(DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX)) {
          +        rmState.rmSecretManagerState.dtSequenceNumber =
          +            Integer.parseInt(childNodeName.split("_")[1])
          

          We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore.

          storeRMDTMasterKeyState() is not using multi-operation. We should be issuing every action on ZK using a multi-operation for fencing to work.

          I am fine with the patch as is since it puts the framework in place. The above comments may be addressed in YARN-1222 thats opened under YARN-149 for HA. YARN-1222 blocks YARN-1026.

          Show
          Bikas Saha added a comment - We should be storing the information inside the data of the znode instead of encoding it in the name of the znode. The rename pattern was followed as an optimization in FSStore but is not necessary in ZKStore give set-with-overwrite operations. Its also better to avoid encoding the info in the name since doing "ls" in ZK is more relaxed wrt permissions than it is in FileSystem. + if (childNodeName.startsWith(DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX)) { + rmState.rmSecretManagerState.dtSequenceNumber = + Integer .parseInt(childNodeName.split( "_" )[1]) We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore. storeRMDTMasterKeyState() is not using multi-operation. We should be issuing every action on ZK using a multi-operation for fencing to work. I am fine with the patch as is since it puts the framework in place. The above comments may be addressed in YARN-1222 thats opened under YARN-149 for HA. YARN-1222 blocks YARN-1026 .
          Hide
          Hitesh Shah added a comment -

          Thanks Bikas, Jian and Karthik. Committed to trunk and branch-2.

          Arun C Murthy Vinod Kumar Vavilapalli Let me know if this needs to go into branch-2.1.

          Show
          Hitesh Shah added a comment - Thanks Bikas, Jian and Karthik. Committed to trunk and branch-2. Arun C Murthy Vinod Kumar Vavilapalli Let me know if this needs to go into branch-2.1.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #4443 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4443/)
          YARN-353. Add Zookeeper-based store implementation for RMStateStore. Contributed by Bikas Saha, Jian He and Karthik Kambatla. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524829)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #4443 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4443/ ) YARN-353 . Add Zookeeper-based store implementation for RMStateStore. Contributed by Bikas Saha, Jian He and Karthik Kambatla. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524829 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
          Hide
          Arun C Murthy added a comment -

          No, I think we are good. branch-2.1 is too close for major additions. Thanks.

          Show
          Arun C Murthy added a comment - No, I think we are good. branch-2.1 is too close for major additions. Thanks.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #338 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/338/)
          YARN-353. Add Zookeeper-based store implementation for RMStateStore. Contributed by Bikas Saha, Jian He and Karthik Kambatla. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524829)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #338 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/338/ ) YARN-353 . Add Zookeeper-based store implementation for RMStateStore. Contributed by Bikas Saha, Jian He and Karthik Kambatla. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524829 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1554 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1554/)
          YARN-353. Add Zookeeper-based store implementation for RMStateStore. Contributed by Bikas Saha, Jian He and Karthik Kambatla. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524829)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1554 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1554/ ) YARN-353 . Add Zookeeper-based store implementation for RMStateStore. Contributed by Bikas Saha, Jian He and Karthik Kambatla. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524829 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1528 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1528/)
          YARN-353. Add Zookeeper-based store implementation for RMStateStore. Contributed by Bikas Saha, Jian He and Karthik Kambatla. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524829)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java
          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1528 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1528/ ) YARN-353 . Add Zookeeper-based store implementation for RMStateStore. Contributed by Bikas Saha, Jian He and Karthik Kambatla. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524829 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ClientBaseWithFixes.java /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java

            People

            • Assignee:
              Karthik Kambatla
              Reporter:
              Hitesh Shah
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development