Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11295

RPC Server Reader thread can't shutdown if RPCCallQueue is full

    Details

    • Hadoop Flags:
      Reviewed

      Description

      If RPC server is asked to stop when RPCCallQueue is full, reader.join() will just wait there. That is because

      1. The reader thread is blocked on callQueue.put(call);.
      2. When RPC server is asked to stop, it will interrupt all handler threads and thus no threads will drain the callQueue.

      1. HADOOP-11295.006.patch
        4 kB
        Gera Shegalov
      2. HADOOP-11295.branch-2.6.patch
        4 kB
        Akira Ajisaka
      3. HADOOP-11295.patch
        3 kB
        Ming Ma
      4. HADOOP-11295-2.patch
        3 kB
        Ming Ma
      5. HADOOP-11295-3.patch
        3 kB
        Ming Ma
      6. HADOOP-11295-4.patch
        4 kB
        Gera Shegalov
      7. HADOOP-11295-5.patch
        4 kB
        Gera Shegalov

        Issue Links

          Activity

          Hide
          mingma Ming Ma added a comment -

          The fix is to interrupt the reader threads when RPC Server is stopped.

          Show
          mingma Ming Ma added a comment - The fix is to interrupt the reader threads when RPC Server is stopped.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12681159/HADOOP-11295.patch
          against trunk revision 782abbb.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5070//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5070//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681159/HADOOP-11295.patch against trunk revision 782abbb. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5070//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5070//console This message is automatically generated.
          Hide
          mingma Ming Ma added a comment -

          Rebased for trunk. Appreciate any input.

          Show
          mingma Ming Ma added a comment - Rebased for trunk. Appreciate any input.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688254/HADOOP-11295-2.patch
          against trunk revision 6635ccd.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5310//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5310//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5310//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688254/HADOOP-11295-2.patch against trunk revision 6635ccd. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5310//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5310//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5310//console This message is automatically generated.
          Hide
          sjlee0 Sangjin Lee added a comment -

          LGTM. Thanks Ming Ma!

          Show
          sjlee0 Sangjin Lee added a comment - LGTM. Thanks Ming Ma !
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks for reporting the issue, Ming Ma! The fix in Server.java is correct.
          Some comments to testRPCServerShutdown:

          • Use of the qualifier final is inconsistent. More variables (conf, clientN) can be made final, or get rid of it.
          • Variable addr is used only once, please inline it as a parameter of getProxy for clarity
          • Don't swallow InterruptedException around in the while loops around sleep they can be legit. I think It's fine to to remove try/catch and have the test fail.
          • 1000 millis is a little excessive. Let us do 100
          • Move the client interrupt/join logic into the finally block past server.stop. Then you actually don't need to call client.interrupt because client calls will exit on its own.
          • Use client.join() without the timeout parameter, we are already protected by the Test timeout from blocking indefinitely.
          Show
          jira.shegalov Gera Shegalov added a comment - Thanks for reporting the issue, Ming Ma ! The fix in Server.java is correct. Some comments to testRPCServerShutdown : Use of the qualifier final is inconsistent. More variables (conf, clientN) can be made final, or get rid of it. Variable addr is used only once, please inline it as a parameter of getProxy for clarity Don't swallow InterruptedException around in the while loops around sleep they can be legit. I think It's fine to to remove try/catch and have the test fail. 1000 millis is a little excessive. Let us do 100 Move the client interrupt/join logic into the finally block past server.stop. Then you actually don't need to call client.interrupt because client calls will exit on its own. Use client.join() without the timeout parameter, we are already protected by the Test timeout from blocking indefinitely.
          Hide
          mingma Ming Ma added a comment -

          Thanks Gera Shegalov and Sangjin Lee. Here is the updated patch to address all the comments.

          Show
          mingma Ming Ma added a comment - Thanks Gera Shegalov and Sangjin Lee . Here is the updated patch to address all the comments.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688424/HADOOP-11295-3.patch
          against trunk revision 8fa265a.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5322//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5322//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5322//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688424/HADOOP-11295-3.patch against trunk revision 8fa265a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5322//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5322//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5322//console This message is automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks for updating the patch, Ming Ma!

          A couple more comments, we are close. To reproduce the problem in your setup, we need 3 concurrent clients. Thus we don't need to have wait loops after each individual clientX.start(). So just after last client3.start we can wait until all threads are active using the TestRPC#countThreads methods:

                while (countThreads("TestRPC$SleepRPCClient") != 3) {
                  Thread.sleep(100);
                }
          

          After that we need to have another single sleep to be sure that the server-side problem is reproduced instead of while (!client3.isAlive()) loop.

          if (proxy != null) is unnecessary because proxy is known to be non-null in the finally.

          In SleepRPCClient#run, we should catch Throwable and LOG it instead of completely ignoring the exception for better debugability.

          Show
          jira.shegalov Gera Shegalov added a comment - Thanks for updating the patch, Ming Ma ! A couple more comments, we are close. To reproduce the problem in your setup, we need 3 concurrent clients. Thus we don't need to have wait loops after each individual clientX.start() . So just after last client3.start we can wait until all threads are active using the TestRPC#countThreads methods: while (countThreads( "TestRPC$SleepRPCClient" ) != 3) { Thread .sleep(100); } After that we need to have another single sleep to be sure that the server-side problem is reproduced instead of while (!client3.isAlive()) loop. if (proxy != null) is unnecessary because proxy is known to be non-null in the finally. In SleepRPCClient#run , we should catch Throwable and LOG it instead of completely ignoring the exception for better debugability.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Regarding SleepRPCClient exceptions. It's probably even better to use a Callable or to check otherwise whether only the expected InterruptedException is thrown. Otherwise the test may succeed because all calls failed.

          Show
          jira.shegalov Gera Shegalov added a comment - Regarding SleepRPCClient exceptions. It's probably even better to use a Callable or to check otherwise whether only the expected InterruptedException is thrown. Otherwise the test may succeed because all calls failed.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Ming Ma, I took the liberty to incorporate my nit comments myself while you are on a break. Let me know whether it looks good to you when get a chance.

          It would also be good if some other committer takes a look as well.

          Show
          jira.shegalov Gera Shegalov added a comment - Ming Ma , I took the liberty to incorporate my nit comments myself while you are on a break. Let me know whether it looks good to you when get a chance. It would also be good if some other committer takes a look as well.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688707/HADOOP-11295-4.patch
          against trunk revision a696fbb.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverControllerStress
          org.apache.hadoop.ipc.TestRPC

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5331//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5331//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5331//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688707/HADOOP-11295-4.patch against trunk revision a696fbb. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverControllerStress org.apache.hadoop.ipc.TestRPC Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5331//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5331//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5331//console This message is automatically generated.
          Hide
          jira.shegalov Gera Shegalov added a comment -

          relaxed ConnectException to IOException as the expected exception.

          Show
          jira.shegalov Gera Shegalov added a comment - relaxed ConnectException to IOException as the expected exception.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688791/HADOOP-11295-5.patch
          against trunk revision fdf042d.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverControllerStress

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5333//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5333//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5333//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688791/HADOOP-11295-5.patch against trunk revision fdf042d. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverControllerStress Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5333//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5333//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5333//console This message is automatically generated.
          Hide
          mingma Ming Ma added a comment -

          Thanks, Gera Shegalov! Good idea to use ExecutorService. One minor thing, in the check if all client calls have been received by the RPC server as in server.getCallQueueLen() != 1 && countThreads(CallQueueManager.class.getName()) != 1, it will be useful to add another check to make sure the handler thread is blocked, something like countThreads(TestProtocol.class.getName()) != 1. This will cover the case where handler thread is slow to start up and thus the test code leaves the while loop a bit early. Otherwise, it looks good.

          Show
          mingma Ming Ma added a comment - Thanks, Gera Shegalov ! Good idea to use ExecutorService. One minor thing, in the check if all client calls have been received by the RPC server as in server.getCallQueueLen() != 1 && countThreads(CallQueueManager.class.getName()) != 1 , it will be useful to add another check to make sure the handler thread is blocked, something like countThreads(TestProtocol.class.getName()) != 1 . This will cover the case where handler thread is slow to start up and thus the test code leaves the while loop a bit early. Otherwise, it looks good.
          Hide
          daryn Daryn Sharp added a comment -

          Per #2 in the description, I haven't thought about it too hard, but should we consider a more graceful shutdown? Close the listener, and interrupt the readers, interrupt the handlers?

          Show
          daryn Daryn Sharp added a comment - Per #2 in the description, I haven't thought about it too hard, but should we consider a more graceful shutdown? Close the listener, and interrupt the readers, interrupt the handlers?
          Hide
          mingma Ming Ma added a comment -

          Thanks, Daryn Sharp. Yes, we can change the shutdown order from the current " handlers -> listener -> readers -> responder" to "listener -> readers -> handlers -> responder". But I don't know if it makes much of difference given the server is being shutdown anyway. If you meant draining the RPC queue before the server is shutdown, then we will have to wait for all requests to be processed by handlers and responder; it might not be worthy it. Thoughts?

          Show
          mingma Ming Ma added a comment - Thanks, Daryn Sharp . Yes, we can change the shutdown order from the current " handlers -> listener -> readers -> responder" to "listener -> readers -> handlers -> responder". But I don't know if it makes much of difference given the server is being shutdown anyway. If you meant draining the RPC queue before the server is shutdown, then we will have to wait for all requests to be processed by handlers and responder; it might not be worthy it. Thoughts?
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Incorporating Ming's suggestion in 006. I am +1 for this version since it's non-invasive. Daryn Sharp, do you mind handling your suggestion in a separate JIRA?

          Show
          jira.shegalov Gera Shegalov added a comment - Incorporating Ming's suggestion in 006. I am +1 for this version since it's non-invasive. Daryn Sharp , do you mind handling your suggestion in a separate JIRA?
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699227/HADOOP-11295.006.patch
          against trunk revision 500e6a0.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5721//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5721//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699227/HADOOP-11295.006.patch against trunk revision 500e6a0. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5721//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5721//console This message is automatically generated.
          Hide
          kihwal Kihwal Lee added a comment -

          +1 I think it is fine for now. We may improve the shutdown later.

          Show
          kihwal Kihwal Lee added a comment - +1 I think it is fine for now. We may improve the shutdown later.
          Hide
          kihwal Kihwal Lee added a comment -

          Committed to trunk and branch-2.

          Show
          kihwal Kihwal Lee added a comment - Committed to trunk and branch-2.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7138 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7138/)
          HADOOP-11295. RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7138 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7138/ ) HADOOP-11295 . RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          mingma Ming Ma added a comment -

          Thanks, Gera, Sangjin, Daryn and Kihwal.

          Show
          mingma Ming Ma added a comment - Thanks, Gera, Sangjin, Daryn and Kihwal.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Hi Kihwal Lee, I don't see the patch is committed to branch-2. Would you commit it?

          Show
          ajisakaa Akira Ajisaka added a comment - Hi Kihwal Lee , I don't see the patch is committed to branch-2. Would you commit it?
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Kihwal Lee cc: Akira Ajisaka I checked this JIRA, and you did +1 for branch-2 commit. I cherry-picked from the commit 685af8a3d0504724fe588daf3722519fedc45b01 to branch-2 since it causes conflicts of other commits. Please revert it if you find a problem.

          Show
          ozawa Tsuyoshi Ozawa added a comment - Kihwal Lee cc: Akira Ajisaka I checked this JIRA, and you did +1 for branch-2 commit. I cherry-picked from the commit 685af8a3d0504724fe588daf3722519fedc45b01 to branch-2 since it causes conflicts of other commits. Please revert it if you find a problem.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #842 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/842/)
          HADOOP-11295. RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #842 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/842/ ) HADOOP-11295 . RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #108 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/108/)
          HADOOP-11295. RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #108 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/108/ ) HADOOP-11295 . RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #99 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/99/)
          HADOOP-11295. RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #99 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/99/ ) HADOOP-11295 . RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2040 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2040/)
          HADOOP-11295. RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2040 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2040/ ) HADOOP-11295 . RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          Hide
          kihwal Kihwal Lee added a comment -

          Akira Ajisaka Tsuyoshi Ozawa, sorry, apparently I didn't push it after the cherry-pick. Thanks for noticing and fixing it.

          Show
          kihwal Kihwal Lee added a comment - Akira Ajisaka Tsuyoshi Ozawa , sorry, apparently I didn't push it after the cherry-pick. Thanks for noticing and fixing it.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #109 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/109/)
          HADOOP-11295. RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01)

          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #109 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/109/ ) HADOOP-11295 . RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2059 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2059/)
          HADOOP-11295. RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01)

          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2059 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2059/ ) HADOOP-11295 . RPC Server Reader thread can't shutdown if RPCCallQueue is full. Contributed by Ming Ma. (kihwal: rev 685af8a3d0504724fe588daf3722519fedc45b01) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Rebased the patch for branch-2.6.

          Show
          ajisakaa Akira Ajisaka added a comment - Rebased the patch for branch-2.6.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Pulled this into 2.6.1 following Akira Ajisaka's patch. Ran compilation and TestRPC before the push.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Pulled this into 2.6.1 following Akira Ajisaka 's patch. Ran compilation and TestRPC before the push.
          Hide
          xiaochen Xiao Chen added a comment -

          Hi Ming Ma and Gera Shegalov,

          I was reading up this JIRA, and the fix in Server.java looks great. Thanks for the contribution!
          However, in the test case in the latest patch, IMO instead of checking:
          {{
          while (server.getCallQueueLen() != 1
          && countThreads(CallQueueManager.class.getName()) != 1
          && countThreads(TestProtocol.class.getName()) != 1)

          { Thread.sleep(100); } }}
          we should check:
          {{
          while (server.getCallQueueLen() != 1
          || countThreads(CallQueueManager.class.getName()) != 1
          || countThreads(TestProtocol.class.getName()) != 1) { Thread.sleep(100);}}}

          to guarantee that the threads are in expected states when we later call server.stop().

          Also, it appears to me that the class name for the test protocol is org.apache.hadoop.ipc.TestRPC$TestImpl, as we setInstance(new TestImpl()) when creating the server. So I suggest instead of checking countThreads(TestProtocol.class.getName()), we can check countThreads(TestImpl.class.getName())

          Please let me know if this makes sense to you. Thanks.

          Show
          xiaochen Xiao Chen added a comment - Hi Ming Ma and Gera Shegalov , I was reading up this JIRA, and the fix in Server.java looks great. Thanks for the contribution! However, in the test case in the latest patch, IMO instead of checking: {{ while (server.getCallQueueLen() != 1 && countThreads(CallQueueManager.class.getName()) != 1 && countThreads(TestProtocol.class.getName()) != 1) { Thread.sleep(100); } }} we should check: {{ while (server.getCallQueueLen() != 1 || countThreads(CallQueueManager.class.getName()) != 1 || countThreads(TestProtocol.class.getName()) != 1) { Thread.sleep(100);}}} to guarantee that the threads are in expected states when we later call server.stop() . Also, it appears to me that the class name for the test protocol is org.apache.hadoop.ipc.TestRPC$TestImpl, as we setInstance(new TestImpl()) when creating the server. So I suggest instead of checking countThreads(TestProtocol.class.getName()) , we can check countThreads(TestImpl.class.getName()) Please let me know if this makes sense to you. Thanks.
          Hide
          mingma Ming Ma added a comment -

          Good catch Xiao Chen! Please go ahead to attach your patch in a new jira.

          Show
          mingma Ming Ma added a comment - Good catch Xiao Chen ! Please go ahead to attach your patch in a new jira.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Ming. I have created HADOOP-12440 and linked to this JIRA.

          Show
          xiaochen Xiao Chen added a comment - Thanks Ming. I have created HADOOP-12440 and linked to this JIRA.

            People

            • Assignee:
              mingma Ming Ma
              Reporter:
              mingma Ming Ma
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development