Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3385

Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion(Op.delete).
      The race condition is similar as YARN-3023.
      since the race condition exists for ZK node creation, it should also exist for ZK node deletion.
      We see this issue with the following stack trace:

      2015-03-17 19:18:58,958 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause:
      org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
      	at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
      	at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:691)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
      	at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
      	at java.lang.Thread.run(Thread.java:745)
      2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
      
      1. YARN-3385.000.patch
        8 kB
        zhihai xu
      2. YARN-3385.001.patch
        8 kB
        zhihai xu
      3. YARN-3385.002.patch
        10 kB
        zhihai xu
      4. YARN-3385.003.patch
        10 kB
        zhihai xu
      5. YARN-3385.004.patch
        10 kB
        zhihai xu

        Issue Links

          Activity

          Hide
          zxu zhihai xu added a comment -

          The sequence for the Race condition is the following:
          1, RM try to remove application application_1426560404988_0132 state from ZKRMStateStore.

          2015-03-17 19:18:48,075 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of completed apps kept in state store met: maxCompletedAppsInStateStore = 10000, removing app application_1426560404988_0132 from state store.
          2015-03-17 19:18:48,075 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1426560404988_0132
          

          2. Unluckily ConnectionLoss for the ZK session happened at the same time as RM remove application state from ZK.
          The ZooKeeper server deleted the node successfully, But due to ConnectionLoss, RM didn't know the operation succeeded.

          2015-03-17 19:18:51,836 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation.
          org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
          

          3.RM did retry to remove application state to ZK

          2015-03-17 19:18:51,837 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying operation on ZK. Retry no. 1
          

          4. during the retry, the ZK session is reconnected.

          2015-03-17 19:18:58,924 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server, sessionid = 0x24be28f536e2006, negotiated timeout = 10000
          

          5. Because the node was already deleted successfully at ZooKeeper in the previous operation, it will fail with NoNode KeeperException for the retry

          2015-03-17 19:18:58,956 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation.
          org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
          2015-03-17 19:18:58,956 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up!
          

          6.This NoNode KeeperException will cause removing app failure in RMStateStore

          2015-03-17 19:18:58,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1426560404988_0132
          org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
          

          7.RMStateStore will send RMFatalEventType.STATE_STORE_OP_FAILED event to ResourceManager

            protected void notifyStoreOperationFailed(Exception failureCause) {
              RMFatalEventType type;
              if (failureCause instanceof StoreFencedException) {
                type = RMFatalEventType.STATE_STORE_FENCED;
              } else {
                type = RMFatalEventType.STATE_STORE_OP_FAILED;
              }
              rmDispatcher.getEventHandler().handle(new RMFatalEvent(type, failureCause));
            }
          

          8.ResoureManager will kill itself after received STATE_STORE_OP_FAILED RMFatalEvent.

          2015-03-17 19:18:58,958 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause:
          org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
          2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
          
          Show
          zxu zhihai xu added a comment - The sequence for the Race condition is the following: 1, RM try to remove application application_1426560404988_0132 state from ZKRMStateStore. 2015-03-17 19:18:48,075 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of completed apps kept in state store met: maxCompletedAppsInStateStore = 10000, removing app application_1426560404988_0132 from state store. 2015-03-17 19:18:48,075 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1426560404988_0132 2. Unluckily ConnectionLoss for the ZK session happened at the same time as RM remove application state from ZK. The ZooKeeper server deleted the node successfully, But due to ConnectionLoss, RM didn't know the operation succeeded. 2015-03-17 19:18:51,836 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss 3.RM did retry to remove application state to ZK 2015-03-17 19:18:51,837 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying operation on ZK. Retry no. 1 4. during the retry, the ZK session is reconnected. 2015-03-17 19:18:58,924 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server, sessionid = 0x24be28f536e2006, negotiated timeout = 10000 5. Because the node was already deleted successfully at ZooKeeper in the previous operation, it will fail with NoNode KeeperException for the retry 2015-03-17 19:18:58,956 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 2015-03-17 19:18:58,956 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 6.This NoNode KeeperException will cause removing app failure in RMStateStore 2015-03-17 19:18:58,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1426560404988_0132 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 7.RMStateStore will send RMFatalEventType.STATE_STORE_OP_FAILED event to ResourceManager protected void notifyStoreOperationFailed(Exception failureCause) { RMFatalEventType type; if (failureCause instanceof StoreFencedException) { type = RMFatalEventType.STATE_STORE_FENCED; } else { type = RMFatalEventType.STATE_STORE_OP_FAILED; } rmDispatcher.getEventHandler().handle( new RMFatalEvent(type, failureCause)); } 8.ResoureManager will kill itself after received STATE_STORE_OP_FAILED RMFatalEvent. 2015-03-17 19:18:58,958 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 2015-03-17 19:18:58,959 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
          Hide
          zxu zhihai xu added a comment -

          I uploaded a patch YARN-3385.000.patch for review. The patch fixed both Op.delete and zkClient.delete for NoNodeException and optimized the code at removeRMDelegationTokenState to skip ZK delete operation if the node doesn't exist.

          Without the patch, the test will fail with the following message

          -------------------------------------------------------
           T E S T S
          -------------------------------------------------------
          Running org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
          Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.853 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
          testRMAppDeleteNoNodeException(org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore)  Time elapsed: 1.253 sec  <<< FAILURE!
          java.lang.AssertionError: NoNodeException should not happen.
          	at org.junit.Assert.fail(Assert.java:88)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore.testRMAppDeleteNoNodeException(TestZKRMStateStore.java:405)
          Results :
          Failed tests: 
            TestZKRMStateStore.testRMAppDeleteNoNodeException:405 NoNodeException should not happen.
          Tests run: 5, Failures: 1, Errors: 0, Skipped: 0
          
          org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
          	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
          	at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
          	at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:920)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:916)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1080)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1101)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:916)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:928)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:697)
          	at org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore.testRMAppDelete(TestZKRMStateStore.java:401)
          
          Show
          zxu zhihai xu added a comment - I uploaded a patch YARN-3385 .000.patch for review. The patch fixed both Op.delete and zkClient.delete for NoNodeException and optimized the code at removeRMDelegationTokenState to skip ZK delete operation if the node doesn't exist. Without the patch, the test will fail with the following message ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.853 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore testRMAppDeleteNoNodeException(org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore) Time elapsed: 1.253 sec <<< FAILURE! java.lang.AssertionError: NoNodeException should not happen. at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore.testRMAppDeleteNoNodeException(TestZKRMStateStore.java:405) Results : Failed tests: TestZKRMStateStore.testRMAppDeleteNoNodeException:405 NoNodeException should not happen. Tests run: 5, Failures: 1, Errors: 0, Skipped: 0 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:920) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:916) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1080) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:916) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:928) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:697) at org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore.testRMAppDelete(TestZKRMStateStore.java:401)
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12706414/YARN-3385.000.patch
          against trunk revision 4cd54d9.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7069//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7069//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706414/YARN-3385.000.patch against trunk revision 4cd54d9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7069//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7069//console This message is automatically generated.
          Hide
          sidharta-s Sidharta Seethana added a comment -

          zhihai xu , could you please rebase the patch since it doesn't seem to apply ?

          Show
          sidharta-s Sidharta Seethana added a comment - zhihai xu , could you please rebase the patch since it doesn't seem to apply ?
          Hide
          zxu zhihai xu added a comment -

          Thanks Sidharta Seethana, I uploaded a new patch YARN-3385.001.patch based on the latest code base.

          Show
          zxu zhihai xu added a comment - Thanks Sidharta Seethana , I uploaded a new patch YARN-3385 .001.patch based on the latest code base.
          Hide
          jianhe Jian He added a comment -

          thanks zhihai xu ! I'll review this.
          The zk.delete seems not idempotent.
          I think in general, if we have YARN-2716, this problem can be resolved along with that. do you think so ?

          Show
          jianhe Jian He added a comment - thanks zhihai xu ! I'll review this. The zk.delete seems not idempotent. I think in general, if we have YARN-2716 , this problem can be resolved along with that. do you think so ?
          Hide
          zxu zhihai xu added a comment -

          Agreed, If we have YARN-2716, this problem may be solved with it. thanks Jian He!
          It may take sometime to stabilize YARN-2716, In the interim, it will be useful to fix this issue.

          Show
          zxu zhihai xu added a comment - Agreed, If we have YARN-2716 , this problem may be solved with it. thanks Jian He ! It may take sometime to stabilize YARN-2716 , In the interim, it will be useful to fix this issue.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 9s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 49s There were no new javac warning messages.
          +1 javadoc 10m 2s There were no new javadoc warning messages.
          +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 48s The applied patch generated 1 new checkstyle issues (total was 42, now 43).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 15s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 49m 49s Tests failed in hadoop-yarn-server-resourcemanager.
              87m 25s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
            hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12729901/YARN-3385.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 6ae2a0d
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7656/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7656/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7656/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7656/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 9s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 49s There were no new javac warning messages. +1 javadoc 10m 2s There were no new javadoc warning messages. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 48s The applied patch generated 1 new checkstyle issues (total was 42, now 43). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 15s The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 yarn tests 49m 49s Tests failed in hadoop-yarn-server-resourcemanager.     87m 25s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12729901/YARN-3385.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 6ae2a0d checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7656/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7656/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7656/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7656/console This message was automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Tx for working on this zhihai xu!

          Some comments

          • Some commands have multiple delete commands and only one of them may fail - know what happens to the remaining ops?
          • Please rename the existing doMultiWithRetries() to be doStoreMultiWithRetries() and create a new doDeleteMultiWithRetries().

          testRMAppDeleteNoNodeException()

          • conf is unused.
          • Rename the test to be testDuplicateRMAppDeletion().
          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Tx for working on this zhihai xu ! Some comments Some commands have multiple delete commands and only one of them may fail - know what happens to the remaining ops? Please rename the existing doMultiWithRetries() to be doStoreMultiWithRetries() and create a new doDeleteMultiWithRetries(). testRMAppDeleteNoNodeException() conf is unused. Rename the test to be testDuplicateRMAppDeletion().
          Hide
          zxu zhihai xu added a comment -

          Vinod Kumar Vavilapalli, thanks for the thorough review.

          Some commands have multiple delete commands and only one of them may fail ......

          Based on the document about ZooKeeper#multi: Executes multiple ZooKeeper operations or none of them., for multiple delete commands, either all delete operations are done or none of them are done. So I think if any one of them fails, the remaining ops won't be done.

          I uploaded a new patch YARN-3385.002.patch, which addressed all your comments. Please review it.

          Show
          zxu zhihai xu added a comment - Vinod Kumar Vavilapalli , thanks for the thorough review. Some commands have multiple delete commands and only one of them may fail ...... Based on the document about ZooKeeper#multi: Executes multiple ZooKeeper operations or none of them. , for multiple delete commands, either all delete operations are done or none of them are done. So I think if any one of them fails, the remaining ops won't be done. I uploaded a new patch YARN-3385 .002.patch, which addressed all your comments. Please review it.
          Hide
          zxu zhihai xu added a comment -

          By the way, I forget to mention, if NoNodeException happened due to this race condition, it means one of the delete operations was done, because zkClient.multi will either execute all of the Op's or none of them, all of the delete operations must be done. This is a good article which talks about multi update for zookeeper

          Show
          zxu zhihai xu added a comment - By the way, I forget to mention, if NoNodeException happened due to this race condition, it means one of the delete operations was done, because zkClient.multi will either execute all of the Op's or none of them, all of the delete operations must be done. This is a good article which talks about multi update for zookeeper
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 57s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 9m 49s There were no new javac warning messages.
          +1 javadoc 10m 58s There were no new javadoc warning messages.
          +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 51s The applied patch generated 1 new checkstyle issues (total was 42, now 43).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 45s mvn install still works.
          +1 eclipse:eclipse 0m 37s The patch built with eclipse:eclipse.
          +1 findbugs 1m 42s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 56m 31s Tests failed in hadoop-yarn-server-resourcemanager.
              100m 44s  



          Reason Tests
          Timed out tests org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730392/YARN-3385.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 9b01f81
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7711/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7711/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7711/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7711/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 57s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 9m 49s There were no new javac warning messages. +1 javadoc 10m 58s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 51s The applied patch generated 1 new checkstyle issues (total was 42, now 43). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 45s mvn install still works. +1 eclipse:eclipse 0m 37s The patch built with eclipse:eclipse. +1 findbugs 1m 42s The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 yarn tests 56m 31s Tests failed in hadoop-yarn-server-resourcemanager.     100m 44s   Reason Tests Timed out tests org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730392/YARN-3385.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 9b01f81 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7711/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7711/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7711/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7711/console This message was automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          This looks good, +1. Checking this in..

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - This looks good, +1. Checking this in..
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Actually, there is a checkstyle warning and a test related problem. Please look at them.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Actually, there is a checkstyle warning and a test related problem. Please look at them.
          Hide
          zxu zhihai xu added a comment -

          I attached a new patch YARN-3385.003.patch which is to fix the check style issue. Also it is strange the test report log didn't show any test failure.

          Show
          zxu zhihai xu added a comment - I attached a new patch YARN-3385 .003.patch which is to fix the check style issue. Also it is strange the test report log didn't show any test failure.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 59s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 44s There were no new javac warning messages.
          +1 javadoc 10m 37s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 53s The applied patch generated 1 new checkstyle issues (total was 42, now 43).
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 44s mvn install still works.
          +1 eclipse:eclipse 0m 37s The patch built with eclipse:eclipse.
          +1 findbugs 1m 23s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 54m 23s Tests failed in hadoop-yarn-server-resourcemanager.
              92m 54s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730608/YARN-3385.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 0100b15
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7713/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7713/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7713/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7713/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 59s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 44s There were no new javac warning messages. +1 javadoc 10m 37s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 53s The applied patch generated 1 new checkstyle issues (total was 42, now 43). +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 44s mvn install still works. +1 eclipse:eclipse 0m 37s The patch built with eclipse:eclipse. +1 findbugs 1m 23s The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 yarn tests 54m 23s Tests failed in hadoop-yarn-server-resourcemanager.     92m 54s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730608/YARN-3385.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 0100b15 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7713/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7713/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7713/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7713/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 38s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 35s There were no new javac warning messages.
          +1 javadoc 9m 39s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 51s The applied patch generated 1 new checkstyle issues (total was 42, now 43).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 52m 28s Tests failed in hadoop-yarn-server-resourcemanager.
              89m 1s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730639/YARN-3385.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 9809a16
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7715/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7715/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7715/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7715/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 38s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 35s There were no new javac warning messages. +1 javadoc 9m 39s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 51s The applied patch generated 1 new checkstyle issues (total was 42, now 43). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 yarn tests 52m 28s Tests failed in hadoop-yarn-server-resourcemanager.     89m 1s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730639/YARN-3385.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 9809a16 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7715/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7715/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7715/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7715/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 56s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 37s There were no new javac warning messages.
          +1 javadoc 9m 41s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 46s The applied patch generated 1 new checkstyle issues (total was 42, now 43).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 15s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 62m 58s Tests failed in hadoop-yarn-server-resourcemanager.
              99m 52s  



          Reason Tests
          Timed out tests org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730672/YARN-3385.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 90b3845
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7721/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7721/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7721/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7721/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 56s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 37s There were no new javac warning messages. +1 javadoc 9m 41s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 46s The applied patch generated 1 new checkstyle issues (total was 42, now 43). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 15s The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 yarn tests 62m 58s Tests failed in hadoop-yarn-server-resourcemanager.     99m 52s   Reason Tests Timed out tests org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730672/YARN-3385.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 90b3845 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/7721/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7721/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7721/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7721/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          I attached a new patch YARN-3385.004.patch to fix the new checkstyle issue. It is strange no test failure in the testReport.

          Show
          zxu zhihai xu added a comment - I attached a new patch YARN-3385 .004.patch to fix the new checkstyle issue. It is strange no test failure in the testReport.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 30s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 31s There were no new javac warning messages.
          +1 javadoc 9m 30s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 49s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 36s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 yarn tests 53m 31s Tests passed in hadoop-yarn-server-resourcemanager.
              89m 53s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12730709/YARN-3385.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a583a40
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7726/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7726/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7726/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 30s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 31s There were no new javac warning messages. +1 javadoc 9m 30s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 49s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 36s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 yarn tests 53m 31s Tests passed in hadoop-yarn-server-resourcemanager.     89m 53s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12730709/YARN-3385.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a583a40 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7726/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7726/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7726/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          Hi Vinod Kumar Vavilapalli, I fixed the checkstyle warning in the latest patch YARN-3385.004.patch, also all the tests passed.
          Could you review it? thanks

          Show
          zxu zhihai xu added a comment - Hi Vinod Kumar Vavilapalli , I fixed the checkstyle warning in the latest patch YARN-3385 .004.patch, also all the tests passed. Could you review it? thanks
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          yup, +1, checking this in..

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - yup, +1, checking this in..
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Committed this to trunk and branch-2. Also pulled this into 2.7 as this race condition can crash RM. Thanks zhihai xu!

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Committed this to trunk and branch-2. Also pulled this into 2.7 as this race condition can crash RM. Thanks zhihai xu !
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7753 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7753/)
          YARN-3385. Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7753 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7753/ ) YARN-3385 . Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java hadoop-yarn-project/CHANGES.txt
          Hide
          zxu zhihai xu added a comment -

          thanks Vinod Kumar Vavilapalli for the review and committing the patch! Greatly appreciated.

          Show
          zxu zhihai xu added a comment - thanks Vinod Kumar Vavilapalli for the review and committing the patch! Greatly appreciated.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/)
          YARN-3385. Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/ ) YARN-3385 . Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #920 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/920/)
          YARN-3385. Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #920 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/920/ ) YARN-3385 . Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2118 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2118/)
          YARN-3385. Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2118 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2118/ ) YARN-3385 . Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #177 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/177/)
          YARN-3385. Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #177 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/177/ ) YARN-3385 . Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #187 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/187/)
          YARN-3385. Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #187 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/187/ ) YARN-3385 . Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2136 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2136/)
          YARN-3385. Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2136 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2136/ ) YARN-3385 . Fixed a race-condition in ResourceManager's ZooKeeper based state-store to avoid crashing on duplicate deletes. Contributed by Zhihai Xu. (vinodkv: rev 4c7b9b6abe2452c9752a11214762be2e7665fb32) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java

            People

            • Assignee:
              zxu zhihai xu
              Reporter:
              zxu zhihai xu
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development