Hadoop Common
  1. Hadoop Common
  2. HADOOP-2188

RPC should send a ping rather than use client timeouts

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.1
    • Fix Version/s: 0.18.0
    • Component/s: ipc
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Replaced timeouts with pings to check that client connection is alive. Removed the property ipc.client.timeout from the default Hadoop configuration. Removed the metric RpcOpsDiscardedOPsNum.
      Show
      Replaced timeouts with pings to check that client connection is alive. Removed the property ipc.client.timeout from the default Hadoop configuration. Removed the metric RpcOpsDiscardedOPsNum.

      Description

      Current RPC (really IPC) relies on client side timeouts to find "dead" sockets. I propose that we have a thread that once a minute (if the connection has been idle) writes a "ping" message to the socket. The client can detect a dead socket by the resulting error on the write, so no client side timeout is required. Also note that the ipc server does not need to respond to the ping, just discard it.

      1. ipc-timeout9.patch
        54 kB
        Hairong Kuang
      2. ipc-timeout8.patch
        54 kB
        Hairong Kuang
      3. ipc-timeout7.patch
        53 kB
        Hairong Kuang
      4. ipc-timeout6.patch
        51 kB
        Hairong Kuang
      5. ipc-timeout5.patch
        50 kB
        Hairong Kuang
      6. ipc-timeout4.patch
        49 kB
        Hairong Kuang
      7. ipc-timeout3.patch
        48 kB
        Hairong Kuang
      8. ipc-timeout2.patch
        47 kB
        Hairong Kuang
      9. ipc-timeout1.patch
        48 kB
        Hairong Kuang
      10. ipc-timeout.patch
        34 kB
        Hairong Kuang
      11. rpc-to.patch
        27 kB
        Owen O'Malley

        Issue Links

          Activity

          Hide
          Devaraj Das added a comment -

          Owen, could you please clarify this. So, the way things work is that if a client connection has been left idle too long (controlled via ipc.client.connection.maxidletime), the client closes the connection. This is to handle scalability issues to do with connection caching (as discussed in HADOOP-312). What you are proposing will effectively enable connection caching. Did I get you right? (BTW, for the task->tasktracker communication, the idle timeout is set to a very high number since the tasktracker server has to deal with only a very few number of clients).

          Show
          Devaraj Das added a comment - Owen, could you please clarify this. So, the way things work is that if a client connection has been left idle too long (controlled via ipc.client.connection.maxidletime), the client closes the connection. This is to handle scalability issues to do with connection caching (as discussed in HADOOP-312 ). What you are proposing will effectively enable connection caching. Did I get you right? (BTW, for the task->tasktracker communication, the idle timeout is set to a very high number since the tasktracker server has to deal with only a very few number of clients).
          Hide
          Owen O'Malley added a comment -

          No, the ping will only be sent if there are outstanding rpc calls, not when the connection is idle. It is still right to close the connection if it has been idle too long.

          Show
          Owen O'Malley added a comment - No, the ping will only be sent if there are outstanding rpc calls, not when the connection is idle. It is still right to close the connection if it has been idle too long.
          Hide
          Owen O'Malley added a comment -

          This patch hasn't been tested at scale, but it passes unit tests.

          It changes:
          1. The call dropping code in the server is removed.
          2. The client timeout code is removed.
          3. The client writes an ipc with length = -1 once a minute if there are rpcs using the connection.
          4. The server ignores messages with length = -1
          5. A few Iterators have been typed and interruptedexceptions not ignored.
          6. Added a new closing state for ipc connections.
          7. Explicitly kill the active rpcs when a connection is broken.

          Show
          Owen O'Malley added a comment - This patch hasn't been tested at scale, but it passes unit tests. It changes: 1. The call dropping code in the server is removed. 2. The client timeout code is removed. 3. The client writes an ipc with length = -1 once a minute if there are rpcs using the connection. 4. The server ignores messages with length = -1 5. A few Iterators have been typed and interruptedexceptions not ignored. 6. Added a new closing state for ipc connections. 7. Explicitly kill the active rpcs when a connection is broken.
          Hide
          Raghu Angadi added a comment -

          Owen, could describe the policy on Server with the patch? (I mean what and how it fixes...)

          for e.g. :

          1. MAX_QUEUE_LENGTH is removed.. but not what stops server from reading too many (no need to read from clients if there are enough RPCs already pending).
          Show
          Raghu Angadi added a comment - Owen, could describe the policy on Server with the patch? (I mean what and how it fixes...) for e.g. : MAX_QUEUE_LENGTH is removed.. but not what stops server from reading too many (no need to read from clients if there are enough RPCs already pending).
          Hide
          Owen O'Malley added a comment -

          It would be better if the server stopped reading the incoming queue, but I believe it is not critical for a first pass. The reason is that without the client side timeouts, the size of the queue is bounded by the number of client threads calling the server. So even if you have 2k servers running 4 maps each with 5 threads each, that is still only 40k pending calls, which won't be enough to blow out memory.

          Show
          Owen O'Malley added a comment - It would be better if the server stopped reading the incoming queue, but I believe it is not critical for a first pass. The reason is that without the client side timeouts, the size of the queue is bounded by the number of client threads calling the server. So even if you have 2k servers running 4 maps each with 5 threads each, that is still only 40k pending calls, which won't be enough to blow out memory.
          Hide
          Doug Cutting added a comment -

          The approach looks good. The iterator typing adds noise to the patch.

          I'm confused by the 'out.close()' call in sendPing(). Is that intended?

          Perhaps the ping interval should be configurable (undocumented) and a test case could be added that sets a short ping time, shuts down a server, and sees that the client fails promptly?

          Show
          Doug Cutting added a comment - The approach looks good. The iterator typing adds noise to the patch. I'm confused by the 'out.close()' call in sendPing(). Is that intended? Perhaps the ping interval should be configurable (undocumented) and a test case could be added that sets a short ping time, shuts down a server, and sees that the client fails promptly?
          Hide
          Raghu Angadi added a comment -

          > 7. Explicitly kill the active rpcs when a connection is broken.
          Owen, I could not see where this is done in the patch. Could you point to the file that has this change?

          Also, client used to wait inside waitForWor() if there are no pending RPCs, but now it waits on read() from socket. Is that intended?

          Show
          Raghu Angadi added a comment - > 7. Explicitly kill the active rpcs when a connection is broken. Owen, I could not see where this is done in the patch. Could you point to the file that has this change? Also, client used to wait inside waitForWor() if there are no pending RPCs, but now it waits on read() from socket. Is that intended?
          Hide
          Owen O'Malley added a comment -

          Thanks for the catch, Doug. The close should have been a flush. And you are right that I probably should make a hidden switch to control the ping rate.

          Raghu, the rpcs are killed at the top of Client.Connection.close.

          I'll think about the difference between waiting in the waitForWork versus the readint. smile

          Show
          Owen O'Malley added a comment - Thanks for the catch, Doug. The close should have been a flush. And you are right that I probably should make a hidden switch to control the ping rate. Raghu, the rpcs are killed at the top of Client.Connection.close. I'll think about the difference between waiting in the waitForWork versus the readint. smile
          Hide
          Sameer Paranjpye added a comment -

          Does the server side continue to have a timeout?
          Seems like it should, badly behaved clients could cause a server brownout by sending RPCs, never reading the responses and so occupying all the handlers.

          Show
          Sameer Paranjpye added a comment - Does the server side continue to have a timeout? Seems like it should, badly behaved clients could cause a server brownout by sending RPCs, never reading the responses and so occupying all the handlers.
          Hide
          Raghu Angadi added a comment -

          > the rpcs are killed at the top of Client.Connection.close.
          hmm.. client side does not help the server side. One big issue at the server when it is loaded is that we will have a lot of RPCs waiting and server patiently executes all of them only to realize that client side has closed the connection for many of them, this in turn slows down the other clients, they will close the connections. I think we need Sever to avoid processing RPCs from connections which are closed already... each handler could check if the connection has a error.

          Show
          Raghu Angadi added a comment - > the rpcs are killed at the top of Client.Connection.close. hmm.. client side does not help the server side. One big issue at the server when it is loaded is that we will have a lot of RPCs waiting and server patiently executes all of them only to realize that client side has closed the connection for many of them, this in turn slows down the other clients, they will close the connections. I think we need Sever to avoid processing RPCs from connections which are closed already... each handler could check if the connection has a error.
          Hide
          Raghu Angadi added a comment -

          > One big issue at the server when it is loaded is that we will have a lot of RPCs waiting and server...
          with this patch client is not going to close the connection.. so the server problem may not as bad. That depends on how we handle the individual RPC timeouts

          What about the timeout for RPCs (set by users) as opposed to timeout for connection. There is no use of Server excuting RPC that are timedout on client side.

          Show
          Raghu Angadi added a comment - > One big issue at the server when it is loaded is that we will have a lot of RPCs waiting and server... with this patch client is not going to close the connection.. so the server problem may not as bad. That depends on how we handle the individual RPC timeouts What about the timeout for RPCs (set by users) as opposed to timeout for connection. There is no use of Server excuting RPC that are timedout on client side.
          Hide
          Hairong Kuang added a comment -

          Owen's proposal looks good. Here are a few things I plan to do at the client side:
          1. Remove client.call.timeout from Configuration;
          2. Connection thread waits for response with a timeout of 1 minute.
          3. If it does not receive a response, it pings the server to detect server side failure. If failure is detected, mark all pending calls as done.
          4. Remove ConnectionCuller thread. Instead Connection thread detects idle connection itself in waitForWork.
          5. Application thread does not timeout checking for results. Instead it waits until a call is done;

          Show
          Hairong Kuang added a comment - Owen's proposal looks good. Here are a few things I plan to do at the client side: 1. Remove client.call.timeout from Configuration; 2. Connection thread waits for response with a timeout of 1 minute. 3. If it does not receive a response, it pings the server to detect server side failure. If failure is detected, mark all pending calls as done. 4. Remove ConnectionCuller thread. Instead Connection thread detects idle connection itself in waitForWork. 5. Application thread does not timeout checking for results. Instead it waits until a call is done;
          Hide
          Hairong Kuang added a comment -

          In addition to the changes I proposed above, it has the following changes:
          1. remove the field inUse in Client.Connection. In stead use calls.size().
          2. Start the connection thread after I/O streams are set up.
          3. Remove call.setResult, instead use setValuse and setException.

          This patch is posted for review. It passed junit tests. I will perform more extensive large scale tests.

          Show
          Hairong Kuang added a comment - In addition to the changes I proposed above, it has the following changes: 1. remove the field inUse in Client.Connection. In stead use calls.size(). 2. Start the connection thread after I/O streams are set up. 3. Remove call.setResult, instead use setValuse and setException. This patch is posted for review. It passed junit tests. I will perform more extensive large scale tests.
          Hide
          Hairong Kuang added a comment -

          This patch additionally fixed a lot of race conditions in the IPC Client code.

          Show
          Hairong Kuang added a comment - This patch additionally fixed a lot of race conditions in the IPC Client code.
          Hide
          Hairong Kuang added a comment -

          This patch makes sure that the right stack trace is returned when a rpc has an error.

          Show
          Hairong Kuang added a comment - This patch makes sure that the right stack trace is returned when a rpc has an error.
          Hide
          Doug Cutting added a comment -

          minor nits:

          • ipc.ping.interval should be set with a static accessor method that uses a constant to name the config property, not accessed directly with a literal string. Eventually we should replace all literal string configuration accesses with accessor methods, but in the meantime we should use accessor methods for new parameters and when changing existing parameters.
          • go ahead and remove maxQueueSize if it is unused. if its needed later it can be re-added.
          • should we log pings at debug level?
          • we should use a constant instead of a literal -1 to identify a ping
          • out.write(-1) – shouldn't that be writeInt()?
          Show
          Doug Cutting added a comment - minor nits: ipc.ping.interval should be set with a static accessor method that uses a constant to name the config property, not accessed directly with a literal string. Eventually we should replace all literal string configuration accesses with accessor methods, but in the meantime we should use accessor methods for new parameters and when changing existing parameters. go ahead and remove maxQueueSize if it is unused. if its needed later it can be re-added. should we log pings at debug level? we should use a constant instead of a literal -1 to identify a ping out.write(-1) – shouldn't that be writeInt()?
          Hide
          Hairong Kuang added a comment -

          Patch 3 incorporated Doug's comments.

          Show
          Hairong Kuang added a comment - Patch 3 incorporated Doug's comments.
          Hide
          Doug Cutting added a comment -

          +1 This looks fine to me. Have you yet had a chance to test it on a large cluster?

          Show
          Doug Cutting added a comment - +1 This looks fine to me. Have you yet had a chance to test it on a large cluster?
          Hide
          Raghu Angadi added a comment -

          I am looking at the patch too.. though still not done. Couple of queries :

          1. Why does the patch remove purging the connections on server if server is not able to write to the socket?
          2. minor : in one place have '{{do { wait(); }

            while (!cond);}}' its unconventional in the sense, normally the right thing to do is to check for condition before waiting.

          Show
          Raghu Angadi added a comment - I am looking at the patch too.. though still not done. Couple of queries : Why does the patch remove purging the connections on server if server is not able to write to the socket? minor : in one place have '{{do { wait(); } while (!cond);}}' its unconventional in the sense, normally the right thing to do is to check for condition before waiting.
          Hide
          Raghu Angadi added a comment -

          receiveResponse() modifies calls without synchronizing on 'this', but other accesses to it do.

          As you mentioned there is more synchronization involved. It is harder check the correctness with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently when isClosed is true.

          Show
          Raghu Angadi added a comment - receiveResponse() modifies calls without synchronizing on ' this ', but other accesses to it do. As you mentioned there is more synchronization involved. It is harder check the correctness with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently when isClosed is true.
          Hide
          Hairong Kuang added a comment -

          Raghu, thanks for the review comments.

          > Why does the patch remove purging the connections on server?
          The purpose of this jira is not to throw away any calls. Otherwise the applications will wait for the result for ever. I do not think we should concern about the number of pending responses so much because hadoop-2910 throttle the client write rate and in general a client's read rate should not be slower than the write rate. If this is really a concern, what we can do is to limit the total number of calls in the server including unserved ones, being served ones, and pending responding ones. HADOOP-2910 only limits the unserved calls.

          Show
          Hairong Kuang added a comment - Raghu, thanks for the review comments. > Why does the patch remove purging the connections on server? The purpose of this jira is not to throw away any calls. Otherwise the applications will wait for the result for ever. I do not think we should concern about the number of pending responses so much because hadoop-2910 throttle the client write rate and in general a client's read rate should not be slower than the write rate. If this is really a concern, what we can do is to limit the total number of calls in the server including unserved ones, being served ones, and pending responding ones. HADOOP-2910 only limits the unserved calls.
          Hide
          Hairong Kuang added a comment -

          > receiveResponse() modifies calls without synchronizing on 'this', but other accesses to it do.
          Calls is of type HashTable, which is a synchronized data structure. There is no need to hold the connection lock.

          Show
          Hairong Kuang added a comment - > receiveResponse() modifies calls without synchronizing on 'this', but other accesses to it do. Calls is of type HashTable, which is a synchronized data structure. There is no need to hold the connection lock.
          Hide
          Raghu Angadi added a comment -

          > The purpose of this jira is not to throw away any calls.

          True. But what should server do if some clients just don't read from the sockets? I think purging exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case that happened is that one user accindentally started thousands of clients from one machine and these clients could not read.

          > Otherwise the applications will wait for the result for ever.
          In that case, we could close the connection while purging, we already do that in some cases while purging.

          In any case purging is usually not required and almost no calls are purged while writing in practice.. we won't notice its absence until something bad happens.

          +0 for removing it, from me. I don't think that should hold up the patch if you are ok with it.

          Show
          Raghu Angadi added a comment - > The purpose of this jira is not to throw away any calls. True. But what should server do if some clients just don't read from the sockets? I think purging exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case that happened is that one user accindentally started thousands of clients from one machine and these clients could not read. > Otherwise the applications will wait for the result for ever. In that case, we could close the connection while purging, we already do that in some cases while purging. In any case purging is usually not required and almost no calls are purged while writing in practice.. we won't notice its absence until something bad happens. +0 for removing it, from me. I don't think that should hold up the patch if you are ok with it.
          Hide
          Hairong Kuang added a comment -

          > As you mentioned there is more synchronization involved. It is harder check the correctness with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently when isClosed is true.

          Since this patch removes SocketTimeoutException, it exposes quite a lot incorrect synchronizations in the code. Previously applications receive a SocketTimeoutException when a call is lost but now applications get stuck for ever. It took me quite a lot of energy to debug and sort out the synchronization part. Thank you for taking time to check its correctness.

          > what should server do if some clients just don't read from the sockets? I think purging exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case that happened is that one user accindentally started thousands of clients from one machine and these clients could not read.

          I think we should assume that clients uses IPC Client to talk to the IPC server, so no worry about their not reading from the sockets. In the case of 1000 clients per machine, if they all can send requests, why could not they read?

          Show
          Hairong Kuang added a comment - > As you mentioned there is more synchronization involved. It is harder check the correctness with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently when isClosed is true. Since this patch removes SocketTimeoutException, it exposes quite a lot incorrect synchronizations in the code. Previously applications receive a SocketTimeoutException when a call is lost but now applications get stuck for ever. It took me quite a lot of energy to debug and sort out the synchronization part. Thank you for taking time to check its correctness. > what should server do if some clients just don't read from the sockets? I think purging exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case that happened is that one user accindentally started thousands of clients from one machine and these clients could not read. I think we should assume that clients uses IPC Client to talk to the IPC server, so no worry about their not reading from the sockets. In the case of 1000 clients per machine, if they all can send requests, why could not they read?
          Hide
          Raghu Angadi added a comment -

          Could you up load the latest patch that applies to trunk?

          Show
          Raghu Angadi added a comment - Could you up load the latest patch that applies to trunk?
          Hide
          Hairong Kuang added a comment -

          This patch incorporates Raghu's comments. It also fixed a few bugs found when running JUnit testcases.

          Show
          Hairong Kuang added a comment - This patch incorporates Raghu's comments. It also fixed a few bugs found when running JUnit testcases.
          Hide
          Raghu Angadi added a comment -

          +1.

          The patch deletes "hadoop.job.history.location" from hadoop-default.xml. Not sure if this is intentional.

          Show
          Raghu Angadi added a comment - +1. The patch deletes "hadoop.job.history.location" from hadoop-default.xml. Not sure if this is intentional.
          Hide
          Hairong Kuang added a comment -

          The new patch removed the change Raghu mentioned.

          Show
          Hairong Kuang added a comment - The new patch removed the change Raghu mentioned.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378362/ipc-timeout4.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 9 new or modified tests.

          javadoc -1. The javadoc tool appears to have generated 1 warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2021/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2021/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2021/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2021/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378362/ipc-timeout4.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2021/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2021/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2021/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2021/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          Fixed javadoc and findbugs warnings.

          Show
          Hairong Kuang added a comment - Fixed javadoc and findbugs warnings.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378398/ipc-timeout5.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 9 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2024/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2024/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2024/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2024/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378398/ipc-timeout5.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2024/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2024/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2024/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2024/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          This patch applies to the trunk. And one difference from the previous patch is that it keeps the purge code, which closes a connection if the connection has a pending response in the queue for more than 15 minutes.

          Show
          Hairong Kuang added a comment - This patch applies to the trunk. And one difference from the previous patch is that it keeps the purge code, which closes a connection if the connection has a pending response in the queue for more than 15 minutes.
          Hide
          Raghu Angadi added a comment - - edited

          +1. I looked only at the recent change.

          We could set {{call.timestamp only inside doRespond() when it is queue to the responder. This will avoid two calls to getTime() for each RPC in the normal case... not that it will really be noticeable.

          Show
          Raghu Angadi added a comment - - edited +1. I looked only at the recent change. We could set {{call.timestamp only inside doRespond() when it is queue to the responder. This will avoid two calls to getTime() for each RPC in the normal case... not that it will really be noticeable.
          Hide
          Hairong Kuang added a comment -

          This patch additionally has the following changes:
          1. In Server, a call's time stamp is updated when it is received and when its reponse is sent to the Responder.
          2. Client.Connection.in is accessed by only the Connection thread, so its lock is no longer to be acquired. This eliminates a possible deadlock in the code.

          Show
          Hairong Kuang added a comment - This patch additionally has the following changes: 1. In Server, a call's time stamp is updated when it is received and when its reponse is sent to the Responder. 2. Client.Connection.in is accessed by only the Connection thread, so its lock is no longer to be acquired. This eliminates a possible deadlock in the code.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12380967/ipc-timeout7.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 9 new or modified tests.

          javadoc -1. The javadoc tool appears to have generated 1 warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2332/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2332/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2332/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2332/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12380967/ipc-timeout7.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2332/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2332/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2332/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2332/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          fixed the javadoc warning.

          Show
          Hairong Kuang added a comment - fixed the javadoc warning.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381200/ipc-timeout8.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 9 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2352/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2352/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2352/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2352/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381200/ipc-timeout8.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2352/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2352/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2352/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2352/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          This patch bumps up the RPC version number. It also fixes a bug occured during large scale testing.

          Show
          Hairong Kuang added a comment - This patch bumps up the RPC version number. It also fixes a bug occured during large scale testing.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381353/ipc-timeout9.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 9 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2375/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2375/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2375/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2375/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381353/ipc-timeout9.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2375/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2375/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2375/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2375/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          GridMix ran successfully on 200-node cluster without any performance degradation.

          Show
          Hairong Kuang added a comment - GridMix ran successfully on 200-node cluster without any performance degradation.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381353/ipc-timeout9.patch
          against trunk revision 653552.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2398/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2398/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2398/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2398/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381353/ipc-timeout9.patch against trunk revision 653552. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2398/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2398/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2398/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2398/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          I just committed this.

          The failure of TestTrash in the second to last hudson test seems not related to this issue. Multiple reruns of the unit test could not reproduce the failure.

          Show
          Hairong Kuang added a comment - I just committed this. The failure of TestTrash in the second to last hudson test seems not related to this issue. Multiple reruns of the unit test could not reproduce the failure.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #483 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/483/ )

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development