Hadoop YARN
  1. Hadoop YARN
  2. YARN-763

AMRMClientAsync should stop heartbeating after receiving shutdown from RM

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0-beta
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. YARN-763.1.patch
      4 kB
      Xuan Gong
    2. YARN-763.2.patch
      5 kB
      Xuan Gong
    3. YARN-763.3.patch
      6 kB
      Xuan Gong
    4. YARN-763.4.patch
      2 kB
      Xuan Gong
    5. YARN-763.5.patch
      6 kB
      Xuan Gong
    6. YARN-763.6.patch
      6 kB
      Xuan Gong
    7. YARN-763.7.patch
      7 kB
      Xuan Gong
    8. YARN-763.8.patch
      8 kB
      Xuan Gong

      Activity

      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12589525/YARN-763.1.patch
      against trunk revision .

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 1 new or modified test files.

      -1 javac. The applied patch generated 1152 javac compiler warnings (more than the trunk's current 1151 warnings).

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 eclipse:eclipse. The patch built with eclipse:eclipse.

      +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

      org.apache.hadoop.yarn.client.api.impl.TestNMClient

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1393//testReport/
      Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1393//artifact/trunk/patchprocess/diffJavacWarnings.txt
      Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1393//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589525/YARN-763.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1152 javac compiler warnings (more than the trunk's current 1151 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1393//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1393//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1393//console This message is automatically generated.
      Hide
      Sandy Ryza added a comment -

      Can we move all of this into the switch statement, replace break with return, and get rid of the stop variable? Unless the thinking is that returning from a method in the middle is bad, I think this would be a lot cleaner.

      Show
      Sandy Ryza added a comment - Can we move all of this into the switch statement, replace break with return, and get rid of the stop variable? Unless the thinking is that returning from a method in the middle is bad, I think this would be a lot cleaner.
      Hide
      Xuan Gong added a comment -

      1. remove the boolean stop
      2. put heartbeat thread interrupt logic inside the switch block

      Show
      Xuan Gong added a comment - 1. remove the boolean stop 2. put heartbeat thread interrupt logic inside the switch block
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12590473/YARN-763.2.patch
      against trunk revision .

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 1 new or modified test files.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 eclipse:eclipse. The patch built with eclipse:eclipse.

      +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

      org.apache.hadoop.yarn.client.api.impl.TestNMClient

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1415//testReport/
      Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1415//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590473/YARN-763.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1415//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1415//console This message is automatically generated.
      Hide
      Bikas Saha added a comment -

      By the time the callback thread handles the shutdown request, the heartbeat thread may have already pinged the RM multiple times and we should ideally avoid that. e.g. since each time the RM will end up sending it a resync/shutdown or might fail it.
      Ideally, the heartbeater thread should check the command and stop as needed so that there are no subsequent heartbeats.

      Not quite clear what the test is testing? The thing to be tested is that there should not be an allocate call made by the heartbeater thread after it has been sent a shutdown command by the RM. I dont quite see anything that verifies this behavior.

      Secondly, there is a lot of probably unnecessary code in the test. I dont think multiple responses after shutdown or mocking client.getAvailableResources is required.

      +    final AllocateResponse response1 = createAllocateResponse(
      +        new ArrayList<ContainerStatus>(), allocated1, null);
      +    final AllocateResponse response2 = createAllocateResponse(completed1,
      +        new ArrayList<Container>(), null);
      +    final AllocateResponse shutDownResponse = createAllocateResponse(
      +        new ArrayList<ContainerStatus>(), new ArrayList<Container>(), null);
      +    shutDownResponse.setAMCommand(AMCommand.AM_SHUTDOWN);
      +
      +    TestCallbackHandler callbackHandler = new TestCallbackHandler();
      +    final AMRMClient<ContainerRequest> client = mock(AMRMClientImpl.class);
      +    when(client.allocate(anyFloat())).thenReturn(shutDownResponse)
      +        .thenReturn(response1).thenReturn(response2);
      +
      +    when(client.registerApplicationMaster(anyString(), anyInt(), anyString()))
      +      .thenReturn(null);
      +    when(client.getAvailableResources()).thenAnswer(new Answer<Resource>() {
      +      @Override
      +      public Resource answer(InvocationOnMock invocation)
      +          throws Throwable {
      +        // take client lock to simulate behavior of real impl
      +        synchronized (client) {
      +          Thread.sleep(10);
      +        }
      +        return null;
      +      }
      +    });
      

      On a different note, serviceStop() should not call join() on the heartbeater thread. While serviceStop() blocks on the join() it may be holding onto application locks in its call tree. The callback thread might be waiting on those locks as it upcalls to the app code. Resulting in a deadlock. However, we should ensure the JVM is not hung because of any issue on this thread. So we should mark the callback thread as a daemon so that the JVM exits even if that thread is running.

      Show
      Bikas Saha added a comment - By the time the callback thread handles the shutdown request, the heartbeat thread may have already pinged the RM multiple times and we should ideally avoid that. e.g. since each time the RM will end up sending it a resync/shutdown or might fail it. Ideally, the heartbeater thread should check the command and stop as needed so that there are no subsequent heartbeats. Not quite clear what the test is testing? The thing to be tested is that there should not be an allocate call made by the heartbeater thread after it has been sent a shutdown command by the RM. I dont quite see anything that verifies this behavior. Secondly, there is a lot of probably unnecessary code in the test. I dont think multiple responses after shutdown or mocking client.getAvailableResources is required. + final AllocateResponse response1 = createAllocateResponse( + new ArrayList<ContainerStatus>(), allocated1, null ); + final AllocateResponse response2 = createAllocateResponse(completed1, + new ArrayList<Container>(), null ); + final AllocateResponse shutDownResponse = createAllocateResponse( + new ArrayList<ContainerStatus>(), new ArrayList<Container>(), null ); + shutDownResponse.setAMCommand(AMCommand.AM_SHUTDOWN); + + TestCallbackHandler callbackHandler = new TestCallbackHandler(); + final AMRMClient<ContainerRequest> client = mock(AMRMClientImpl.class); + when(client.allocate(anyFloat())).thenReturn(shutDownResponse) + .thenReturn(response1).thenReturn(response2); + + when(client.registerApplicationMaster(anyString(), anyInt(), anyString())) + .thenReturn( null ); + when(client.getAvailableResources()).thenAnswer( new Answer<Resource>() { + @Override + public Resource answer(InvocationOnMock invocation) + throws Throwable { + // take client lock to simulate behavior of real impl + synchronized (client) { + Thread .sleep(10); + } + return null ; + } + }); On a different note, serviceStop() should not call join() on the heartbeater thread. While serviceStop() blocks on the join() it may be holding onto application locks in its call tree. The callback thread might be waiting on those locks as it upcalls to the app code. Resulting in a deadlock. However, we should ensure the JVM is not hung because of any issue on this thread. So we should mark the callback thread as a daemon so that the JVM exits even if that thread is running.
      Hide
      Xuan Gong added a comment -

      On a different note, serviceStop() should not call join() on the heartbeater thread. While serviceStop() blocks on the join() it may be holding onto application locks in its call tree. The callback thread might be waiting on those locks as it upcalls to the app code. Resulting in a deadlock. However, we should ensure the JVM is not hung because of any issue on this thread. So we should mark the callback thread as a daemon so that the JVM exits even if that thread is running.

      If we set the callback as daemon thread, calling join() on the heartBeater thread will be fine.

      Show
      Xuan Gong added a comment - On a different note, serviceStop() should not call join() on the heartbeater thread. While serviceStop() blocks on the join() it may be holding onto application locks in its call tree. The callback thread might be waiting on those locks as it upcalls to the app code. Resulting in a deadlock. However, we should ensure the JVM is not hung because of any issue on this thread. So we should mark the callback thread as a daemon so that the JVM exits even if that thread is running. If we set the callback as daemon thread, calling join() on the heartBeater thread will be fine.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12591349/YARN-763.3.patch
      against trunk revision .

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 1 new or modified test files.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 eclipse:eclipse. The patch built with eclipse:eclipse.

      +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

      org.apache.hadoop.yarn.client.api.async.impl.TestAMRMClientAsync

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1432//testReport/
      Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1432//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591349/YARN-763.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.async.impl.TestAMRMClientAsync +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1432//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1432//console This message is automatically generated.
      Hide
      Xuan Gong added a comment -

      fix the test case failure

      Show
      Xuan Gong added a comment - fix the test case failure
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12591357/YARN-763.5.patch
      against trunk revision .

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 1 new or modified test files.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 eclipse:eclipse. The patch built with eclipse:eclipse.

      +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1436//testReport/
      Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1436//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591357/YARN-763.5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1436//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1436//console This message is automatically generated.
      Hide
      Jian He added a comment -

      The test is only testing the heartbeat thread is cancelled after the SHUTDOWN is received. It might be the case that after the SHUTDOWN is received, still make more allocate calls (though unlikely from the code base) and then the thread is cancelled. I think we need a way to explicitly test no more allocate calls are made after SHUTDOWN is received as bikas said earlier.

      Show
      Jian He added a comment - The test is only testing the heartbeat thread is cancelled after the SHUTDOWN is received. It might be the case that after the SHUTDOWN is received, still make more allocate calls (though unlikely from the code base) and then the thread is cancelled. I think we need a way to explicitly test no more allocate calls are made after SHUTDOWN is received as bikas said earlier.
      Hide
      Xuan Gong added a comment -

      Thanks for the comments, Jian. I miss that part.

      Add the verification to test no more allocate calls are made after SHUTDOWN is received.

      Show
      Xuan Gong added a comment - Thanks for the comments, Jian. I miss that part. Add the verification to test no more allocate calls are made after SHUTDOWN is received.
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12591466/YARN-763.6.patch
      against trunk revision .

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 1 new or modified test files.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 eclipse:eclipse. The patch built with eclipse:eclipse.

      +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1438//testReport/
      Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1438//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591466/YARN-763.6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1438//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1438//console This message is automatically generated.
      Hide
      Bikas Saha added a comment -

      Changes mostly look good.

      This assert is not 100% stable because the test code thread can come to this point before the heartbeat thread manages to exit. These are all on different threads that may lost the cpu at arbitrary times. We can remove it. I am not too paranoid given that we are already checking for a single allocate call in the mockClient. We can avoid adding this extra test only method to the main class.

      +    Assert.assertFalse(((AMRMClientAsyncImpl) asyncClient)
      +        .isHeartbeatThreadAlive());
      

      If we change the heartbeat thread sleep interval to 1ms and sleep 5ms after we get the reboot notification in the test code, then we probably get 100% surety in less time overall compared to the 20ms heartbeat interval being used currently.

      If we set the callback as daemon thread, calling join() on the heartBeater thread will be fine.

      I dont think thats correct. join() will still get stuck. So we should continue to interrupt the callback thread but not call join() on it. The original patch for Async client did not have a join() in it. My bad in having it added.

      Show
      Bikas Saha added a comment - Changes mostly look good. This assert is not 100% stable because the test code thread can come to this point before the heartbeat thread manages to exit. These are all on different threads that may lost the cpu at arbitrary times. We can remove it. I am not too paranoid given that we are already checking for a single allocate call in the mockClient. We can avoid adding this extra test only method to the main class. + Assert.assertFalse(((AMRMClientAsyncImpl) asyncClient) + .isHeartbeatThreadAlive()); If we change the heartbeat thread sleep interval to 1ms and sleep 5ms after we get the reboot notification in the test code, then we probably get 100% surety in less time overall compared to the 20ms heartbeat interval being used currently. If we set the callback as daemon thread, calling join() on the heartBeater thread will be fine. I dont think thats correct. join() will still get stuck. So we should continue to interrupt the callback thread but not call join() on it. The original patch for Async client did not have a join() in it. My bad in having it added.
      Hide
      Bikas Saha added a comment -

      we also need to remove the check in serviceStop() that it is being called on the callback handler thread. not needed anymore since we dont join() on that thread. we can add a test that we can call asyncClient.stop() on same thread as callback from asyncClient.

      Show
      Bikas Saha added a comment - we also need to remove the check in serviceStop() that it is being called on the callback handler thread. not needed anymore since we dont join() on that thread. we can add a test that we can call asyncClient.stop() on same thread as callback from asyncClient.
      Hide
      Xuan Gong added a comment -

      Add new patch to address all the comments

      Show
      Xuan Gong added a comment - Add new patch to address all the comments
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12591769/YARN-763.7.patch
      against trunk revision .

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 1 new or modified test files.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 eclipse:eclipse. The patch built with eclipse:eclipse.

      +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1452//testReport/
      Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1452//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591769/YARN-763.7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1452//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1452//console This message is automatically generated.
      Hide
      Jian He added a comment -

      testCallAMRMClientAsyncStopFromCallbackHandler:
      In the test case, asyncClient.stop() is still called from the main test thread, not from callback thread. we need a way to simulate callback thread can actually call asyncClient.stop()
      we should also prevent the main test thread from exiting before the callback thread actually completes calling asyncClient.stop().

      Show
      Jian He added a comment - testCallAMRMClientAsyncStopFromCallbackHandler: In the test case, asyncClient.stop() is still called from the main test thread, not from callback thread. we need a way to simulate callback thread can actually call asyncClient.stop() we should also prevent the main test thread from exiting before the callback thread actually completes calling asyncClient.stop().
      Hide
      Xuan Gong added a comment -

      Oh, You are right. If we want to let CallBackThread to call asyncClient.stop(), we might need to add this part of code inside the CallBackThread.run(). In that case, we may need to create a new test class, such as mockAMRMClientAsync, and re-write CallBackThread.run().

      Any other ideas ?

      Show
      Xuan Gong added a comment - Oh, You are right. If we want to let CallBackThread to call asyncClient.stop(), we might need to add this part of code inside the CallBackThread.run(). In that case, we may need to create a new test class, such as mockAMRMClientAsync, and re-write CallBackThread.run(). Any other ideas ?
      Hide
      Bikas Saha added a comment -

      We can simply create another version of the TestCallbackHandler that calls asyncClient.stop() when asyncClient calls its getProgress() method. After the stop() has completed the method can set a flag and notifyAll(this). The main test thread can wait() on the handler object and check that the flag is set when it gets notified. Else it waits again. This way if the callback thread is deadlocked then test thread will not exit and the test will fail with timeout. To verify, the test should fail with join() and pass without it. Similar logic is used in other tests.

      Please lets not sleep(1000) as this just slows down the testing. Lets sleep(50) and set the heartbeat interval to 10. This allows for 5 heartbeats and so the verification that actual heartbeat count == 1 is accurate.

      Show
      Bikas Saha added a comment - We can simply create another version of the TestCallbackHandler that calls asyncClient.stop() when asyncClient calls its getProgress() method. After the stop() has completed the method can set a flag and notifyAll(this). The main test thread can wait() on the handler object and check that the flag is set when it gets notified. Else it waits again. This way if the callback thread is deadlocked then test thread will not exit and the test will fail with timeout. To verify, the test should fail with join() and pass without it. Similar logic is used in other tests. Please lets not sleep(1000) as this just slows down the testing. Lets sleep(50) and set the heartbeat interval to 10. This allows for 5 heartbeats and so the verification that actual heartbeat count == 1 is accurate.
      Hide
      Xuan Gong added a comment -

      Yes. Modified the testCallAMRMClientAsyncStopFromCallbackHandler. This test will fail with join() and pass without it.

      Show
      Xuan Gong added a comment - Yes. Modified the testCallAMRMClientAsyncStopFromCallbackHandler. This test will fail with join() and pass without it.
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12591879/YARN-763.8.patch
      against trunk revision .

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 1 new or modified test files.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 eclipse:eclipse. The patch built with eclipse:eclipse.

      +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1459//testReport/
      Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1459//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591879/YARN-763.8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1459//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1459//console This message is automatically generated.
      Hide
      Bikas Saha added a comment -

      There was no need to remove join() on the heartbeat thread too. I fixed that minor issue and committed to trunk, branch-2 and branch-2.1-beta. Thanks Xuan!

      Show
      Bikas Saha added a comment - There was no need to remove join() on the heartbeat thread too. I fixed that minor issue and committed to trunk, branch-2 and branch-2.1-beta. Thanks Xuan!
      Hide
      Hudson added a comment -

      SUCCESS: Integrated in Hadoop-trunk-Commit #4080 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4080/)
      YARN-763. AMRMClientAsync should stop heartbeating after receiving shutdown from RM (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502914)

      • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
      • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
      Show
      Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #4080 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4080/ ) YARN-763 . AMRMClientAsync should stop heartbeating after receiving shutdown from RM (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502914 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
      Hide
      Hudson added a comment -

      FAILURE: Integrated in Hadoop-Yarn-trunk #270 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/270/)
      YARN-763. AMRMClientAsync should stop heartbeating after receiving shutdown from RM (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502914)

      • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
      • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
      Show
      Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #270 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/270/ ) YARN-763 . AMRMClientAsync should stop heartbeating after receiving shutdown from RM (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502914 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
      Hide
      Hudson added a comment -

      FAILURE: Integrated in Hadoop-Hdfs-trunk #1460 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1460/)
      YARN-763. AMRMClientAsync should stop heartbeating after receiving shutdown from RM (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502914)

      • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
      • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
      Show
      Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1460 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1460/ ) YARN-763 . AMRMClientAsync should stop heartbeating after receiving shutdown from RM (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502914 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
      Hide
      Hudson added a comment -

      FAILURE: Integrated in Hadoop-Mapreduce-trunk #1487 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1487/)
      YARN-763. AMRMClientAsync should stop heartbeating after receiving shutdown from RM (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502914)

      • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
      • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
      • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
      Show
      Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1487 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1487/ ) YARN-763 . AMRMClientAsync should stop heartbeating after receiving shutdown from RM (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502914 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java

        People

        • Assignee:
          Xuan Gong
          Reporter:
          Bikas Saha
        • Votes:
          0 Vote for this issue
          Watchers:
          9 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development