Attachments
Attachments
- YARN-1764.1.patch
- 31 kB
- Xuan Gong
- YARN-1764.2.patch
- 16 kB
- Xuan Gong
- YARN-1764.3.patch
- 17 kB
- Vinod Kumar Vavilapalli
Issue Links
Activity
create a test class: RMHATestBase, and move many common logics from TestKillApplicationWithRMHA to RMHATestBase. Also create TestSubmitApplicationWithRMHA which can re-use this common logics
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12632776/YARN-1764.1.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 4 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3258//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3258//console
This message is automatically generated.
I think this patch conflicts with YARN-1410 w.r.t test-code. Let's create an order.
Depending on where YARN-1410 goes (see my latest comment there), the code comments referring to YARN-1410 may or may not be needed in this patch.
Some more code level comments
- Can you add a log in YarnClientImpl when we retry the submission?
- Can you improvement the documentation of submitApp() API in ApplicationClientProtocol about the clients needing to retry when the specified exception happens?
- Also add the exception to the documentation to the base protocol.
- Document YarnClient's submit API that we automatically retry when this issue happens.
- All the new files added in the patch have some formatting issues.
- In both the test-cases, after the fail-over, we assert for the states that are not expected (assertFalse). Can we explicitly test for the cases that we expect (assertTrue) ?
I think we should also mark getApplicationReport() to be idempotent in this patch itself as RM can fail-over after submitApplication() returned but during a getApplicationReport(). We will need to add some tests for this too.
Can you add a log in YarnClientImpl when we retry the submission?
DONE
Can you improvement the documentation of submitApp() API in ApplicationClientProtocol about the clients needing to retry when the specified exception happens?
ADDED
Also add the exception to the documentation to the base protocol.
ADDED
Document YarnClient's submit API that we automatically retry when this issue happens.
ADDED
All the new files added in the patch have some formatting issues.
FIXED
In both the test-cases, after the fail-over, we assert for the states that are not expected (assertFalse). Can we explicitly test for the cases that we expect (assertTrue) ?
changed
I think we should also mark getApplicationReport() to be idempotent in this patch itself as RM can fail-over after submitApplication() returned but during a getApplicationReport(). We will need to add some tests for this too.
ADDED
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12633623/YARN-1764.2.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3307//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3307//console
This message is automatically generated.
The latest patch looks good to me but for the javadocs. Here's an updated patch with better javadoc fixes.
Will commit this once Jenkins says okay..
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12633802/YARN-1764.3.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3314//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3314//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12633802/YARN-1764.3.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3315//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3315//console
This message is automatically generated.
Committed this to trunk, branch-2 and branch-2.4. Thanks Xuan!
SUCCESS: Integrated in Hadoop-trunk-Commit #5302 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5302/)
YARN-1764. Modified YarnClient to correctly handle failover of ResourceManager after the submitApplication call goes through. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576160)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java
SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/506/)
YARN-1764. Modified YarnClient to correctly handle failover of ResourceManager after the submitApplication call goes through. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576160)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java
SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/)
YARN-1764. Modified YarnClient to correctly handle failover of ResourceManager after the submitApplication call goes through. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576160)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java
SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/)
YARN-1764. Modified YarnClient to correctly handle failover of ResourceManager after the submitApplication call goes through. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576160)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java
Let us continue our discussions on case 3: Handle RM fail overs after the submitApplication call.
Reply to kkambatl‘s comment:
“ I don't see 3 to be as straight-forward, and suspect would require revisiting the state machine.”
We will only consider the case that failover happens after submitApplication call. It means when failover happens, we have already received the SubmitApplicationResponse.
When the failover happens, we will not re-entry clientRMService#submitApplication() again. What will happen next is that getApplicationReport() will start to execute. And YarnClient will start to re-try until it finds the next active RM, and continue execute getApplicationReport().
Now we have two cases to handle:
For case1, we do not need to make any changes.
For case2, if the failover happens, when we try to execute getApplicationReport, we will get ApplicationNotFoundException. I think this is the only case we should handle here.