HttpComponents HttpCore
  1. HttpComponents HttpCore
  2. HTTPCORE-319

SSLIOSession goes into a loop if the server rejects an invalid certificate

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.2.2
    • Fix Version/s: 4.2.4, 4.3-beta1
    • Component/s: HttpCore NIO
    • Labels:
      None

      Description

      To reproduce:

      • Set up an SSL server that requests certificates from the client.
      • Set up a client with an expired SSL certificate.
      • Establish a connection from the client to the server using BaseNIOReactor and SSLIOSession.

      The server will proceed through the handshake until the client supplies its certificate in response to the CertificateRequest message. At this point, the server's certificate verification will fail and it will close the connection.

      The client socket will become readable due to the EOF and the SSLIOSession.isAppInputReady() method is called to handle the EOF. The bytesRead gets set to -1, which sets this.endOfStream = true. Nothing ever sets the session into the CLOSING or CLOSED state, so it keeps looping on the readable EOF event.

      I'm not sure what the best approach to fixing this should be. It appears that if I close the session manually with the debugger from inside isAppInputReady, the system proceeds normally from that point, however I don't know what the implications of doing that might be.

      1. bug.log
        7 kB
        Scott Stanton
      2. log
        6 kB
        Scott Stanton

        Issue Links

          Activity

          Hide
          Scott Stanton added a comment -

          Yes, I think you can close this one now.

          Show
          Scott Stanton added a comment - Yes, I think you can close this one now.
          Hide
          Oleg Kalnichevski added a comment -

          Scott, any news on this one? Can it be closed as resolved?

          Oleg

          Show
          Oleg Kalnichevski added a comment - Scott, any news on this one? Can it be closed as resolved? Oleg
          Hide
          Scott Stanton added a comment -

          Great! I've applied the patch and it appears to fix the problem in my manual tests. I'll put it up on our servers and see how it does under real conditions.

          Show
          Scott Stanton added a comment - Great! I've applied the patch and it appears to fix the problem in my manual tests. I'll put it up on our servers and see how it does under real conditions.
          Hide
          Oleg Kalnichevski added a comment -

          Yes, that explains it.

          I committed a change to 4.2.x branch [1] that should cause the SSLIOSession to force terminate itself if all of the above mentioned conditions are met. I am not entirely sure there can be no side effects due to this, but for the time being that looks like the most obvious fix to the problem.

          Please re-test with the latest SVN snapshot and let me know if the problem still persists.

          Oleg

          [1] http://svn.apache.org/viewvc?rev=1418951&view=rev

          Show
          Oleg Kalnichevski added a comment - Yes, that explains it. I committed a change to 4.2.x branch [1] that should cause the SSLIOSession to force terminate itself if all of the above mentioned conditions are met. I am not entirely sure there can be no side effects due to this, but for the time being that looks like the most obvious fix to the problem. Please re-test with the latest SVN snapshot and let me know if the problem still persists. Oleg [1] http://svn.apache.org/viewvc?rev=1418951&view=rev
          Hide
          Scott Stanton added a comment -

          Is there some additional data I can gather for you? I'm able to reproduce the problem within a couple of minutes now, if you want me to add more logging.

          Show
          Scott Stanton added a comment - Is there some additional data I can gather for you? I'm able to reproduce the problem within a couple of minutes now, if you want me to add more logging.
          Hide
          Scott Stanton added a comment -

          Walking through the loop, it looks like there isn't any data in the ssl session writer, but the http client connection does have data in its outbuf. However because the handshaker still exists the getHandshakeStatus call returns NEEDS_UNWRAP, so isAppOutputReady() returns false, skipping onOutputReady() each time around. It never flushes the remaining data to the socket because it thinks there's handshaking going on.

          Show
          Scott Stanton added a comment - Walking through the loop, it looks like there isn't any data in the ssl session writer, but the http client connection does have data in its outbuf. However because the handshaker still exists the getHandshakeStatus call returns NEEDS_UNWRAP, so isAppOutputReady() returns false, skipping onOutputReady() each time around. It never flushes the remaining data to the socket because it thinks there's handshaking going on.
          Hide
          Oleg Kalnichevski added a comment -

          That should not matter in my opinion. A NIO channel can be both readable and writable at the same time. What I cannot quite understand from the logs and your description is why no attempts are made to write data out even though the channel is clearly writable.

          Oleg

          Show
          Oleg Kalnichevski added a comment - That should not matter in my opinion. A NIO channel can be both readable and writable at the same time. What I cannot quite understand from the logs and your description is why no attempts are made to write data out even though the channel is clearly writable. Oleg
          Hide
          Scott Stanton added a comment -

          It seems like the primary problem is the readable mask remaining set after
          the eof is detected. If you clear that but continue to push data until
          done or timeout, would that be sufficient?

          Show
          Scott Stanton added a comment - It seems like the primary problem is the readable mask remaining set after the eof is detected. If you clear that but continue to push data until done or timeout, would that be sufficient?
          Hide
          Oleg Kalnichevski added a comment -

          Hi Scott

          I think it is pretty clear as to what is causing the loop. What I am still unsure about is to how terminate it correctly without losing ability to gracefully terminate half-closed SSL connections. One thing makes no sense to me is why we are not seeing any 'produce output' events firing even though the session clearly has write interest and readiness flags set.

          Oleg

          Show
          Oleg Kalnichevski added a comment - Hi Scott I think it is pretty clear as to what is causing the loop. What I am still unsure about is to how terminate it correctly without losing ability to gracefully terminate half-closed SSL connections. One thing makes no sense to me is why we are not seeing any 'produce output' events firing even though the session clearly has write interest and readiness flags set. Oleg
          Hide
          Scott Stanton added a comment -

          Ok, good news. I finally managed to reproduce the problem in the debugger with full logging. See [session|^agent002.log] and [ssl|^ssl.log] files for full tracing. The steps I took to reproduce the problem were as follows:

          • set a breakpoint in SSLEngineImpl.java:491 on the return NEED_UNWRAP line
          • establish a connection
          • watch for the connection handshake to get to the CertificateRequest message so there is a lot of data in the buffers
          • kill the other end of the connection

          At this point the NHttpConnectionBase.close() method will take the first condition (outbuf.hasData()) and set the write event mask without closing the connection and then return to the event loop. When it unwinds to the event loop, both the readable and writable flags are set causing the event loop to immediately wake up again due to the persistent EOF event. The write buffer still has data in it, which continues to prevent the close, despite the fact that the receiving end is gone. I think the intent is for the timeout to take care of this, but because of the persistently readable condition, the socket never has a chance to stay in the reactor long enough to time out.

          I believe the fundamental problem is that the session is allowing the read flag to remain set after EOF. If that were cleared, the socket timeout should eventually close the connection.

          Show
          Scott Stanton added a comment - Ok, good news. I finally managed to reproduce the problem in the debugger with full logging. See [session|^agent002.log] and [ssl|^ssl.log] files for full tracing. The steps I took to reproduce the problem were as follows: set a breakpoint in SSLEngineImpl.java:491 on the return NEED_UNWRAP line establish a connection watch for the connection handshake to get to the CertificateRequest message so there is a lot of data in the buffers kill the other end of the connection At this point the NHttpConnectionBase.close() method will take the first condition (outbuf.hasData()) and set the write event mask without closing the connection and then return to the event loop. When it unwinds to the event loop, both the readable and writable flags are set causing the event loop to immediately wake up again due to the persistent EOF event. The write buffer still has data in it, which continues to prevent the close, despite the fact that the receiving end is gone. I think the intent is for the timeout to take care of this, but because of the persistently readable condition, the socket never has a chance to stay in the reactor long enough to time out. I believe the fundamental problem is that the session is allowing the read flag to remain set after EOF. If that were cleared, the socket timeout should eventually close the connection.
          Hide
          Scott Stanton added a comment - - edited

          Unfortunately we're still seeing the busy loop. I've attached the log that I have from this failure. The sequence appears to be similar to last time in that the socket is closed while there is pending data in the write buffer. The ssl engine handshaker is reporting that need_unwrap is true on the socket.

          Show
          Scott Stanton added a comment - - edited Unfortunately we're still seeing the busy loop. I've attached the log that I have from this failure. The sequence appears to be similar to last time in that the socket is closed while there is pending data in the write buffer. The ssl engine handshaker is reporting that need_unwrap is true on the socket.
          Hide
          Oleg Kalnichevski added a comment -

          Hi Scott
          I believe I have found and (hopefully) fixed the problem. It turned out to be quite tricky. Several rare conditions have to coincide in order to leave the affected SSL connection effectively locked in the CLOSING state being both unable to decrypt incoming data and flush output data and close itself gracefully. So, for the problem to occur the underlying connection has to be abnormally terminated by the opposite end in the middle of a request execution. There has to be a partial or corrupt incoming message stuck in the SSL input buffer which cannot be decrypted without more data (leaving the SSL engine permanently stuck in the UNWRAP state). There has to be some data still stuck in the session output buffer (POST request body in your code) which needs to be flushed in order to close the connection gracefully. The bug causes the SSL I/O session to incorrectly set the i/o event interest mask (output interest does not get activated while the SSL engine is in the UNWRAP state).

          There are different ways of approaching the problem. I committed a rather non-intrusive fix which I believe should fix the problem but I would need your confirmation that this is indeed the case. I also added more SSL session details that get printed to the log and a few defensive checks. Could you please get the latest code from the 4.2.x branch (with this fix [1]) and try it out in your environment? If the problem still persists, please attach a complete wire log of the session to this ticket. I would have to try something more invasive.

          Oleg

          [1] http://svn.apache.org/viewvc?view=revision&revision=1415878

          Show
          Oleg Kalnichevski added a comment - Hi Scott I believe I have found and (hopefully) fixed the problem. It turned out to be quite tricky. Several rare conditions have to coincide in order to leave the affected SSL connection effectively locked in the CLOSING state being both unable to decrypt incoming data and flush output data and close itself gracefully. So, for the problem to occur the underlying connection has to be abnormally terminated by the opposite end in the middle of a request execution. There has to be a partial or corrupt incoming message stuck in the SSL input buffer which cannot be decrypted without more data (leaving the SSL engine permanently stuck in the UNWRAP state). There has to be some data still stuck in the session output buffer (POST request body in your code) which needs to be flushed in order to close the connection gracefully. The bug causes the SSL I/O session to incorrectly set the i/o event interest mask (output interest does not get activated while the SSL engine is in the UNWRAP state). There are different ways of approaching the problem. I committed a rather non-intrusive fix which I believe should fix the problem but I would need your confirmation that this is indeed the case. I also added more SSL session details that get printed to the log and a few defensive checks. Could you please get the latest code from the 4.2.x branch (with this fix [1] ) and try it out in your environment? If the problem still persists, please attach a complete wire log of the session to this ticket. I would have to try something more invasive. Oleg [1] http://svn.apache.org/viewvc?view=revision&revision=1415878
          Hide
          Scott Stanton added a comment -

          Stepping through the loop in the debugger, I see the following:

          • selector detects a readable event (EOF)
          • processing the event winds up in SSLIOSession.isAppInputReady()
          • next it calls doHandshake which goes through the NEED_UNWRAP case and returns
          • back in isAppInputReady, the appEventMask is OP_WRITE, so it returns false
          • we skip onInputReady and go back to the event loop
          • rinse and repeat
          Show
          Scott Stanton added a comment - Stepping through the loop in the debugger, I see the following: selector detects a readable event (EOF) processing the event winds up in SSLIOSession.isAppInputReady() next it calls doHandshake which goes through the NEED_UNWRAP case and returns back in isAppInputReady, the appEventMask is OP_WRITE, so it returns false we skip onInputReady and go back to the event loop rinse and repeat
          Hide
          Scott Stanton added a comment -

          1) The loop I was seeing was that the socket is readable because of an EOF, SSLIOSession.isAppInputReady is called and detects bytesRead is -1, setting endOfStrream to true. It then continues on to doHandshake, etc and returns to the reactor, leaving the read mask set. Since the socket is still at EOF, the reactor immediately starts the whole process over again.
          2) I still don't have a reliable way to reproduce the problem (it only happens a couple of times each day), but I'll try the patch and see if it helps.

          Show
          Scott Stanton added a comment - 1) The loop I was seeing was that the socket is readable because of an EOF, SSLIOSession.isAppInputReady is called and detects bytesRead is -1, setting endOfStrream to true. It then continues on to doHandshake, etc and returns to the reactor, leaving the read mask set. Since the socket is still at EOF, the reactor immediately starts the whole process over again. 2) I still don't have a reliable way to reproduce the problem (it only happens a couple of times each day), but I'll try the patch and see if it helps.
          Hide
          Oleg Kalnichevski added a comment -

          I think I have a theory what might be happening here. I think the cause of the problem is a bug in the HTTP connection state management logic triggered by premature termination of the connection but some things just do not seem to add up. Most importantly I do not see where the I/O session code can be looping. Another thing is that I do not see why #produceOutput method does not fire, even though the session clearly enables output events in order to flush some data stuck in the output buffer. It appears that the connection is still trying to write out request body while the remote endpoint is clearly no longer expecting any more data. More mystery. If the connection can timeout why on earth are we not seeing any activity on that session that keeps inactivity timer from triggering a timeout event?

          Could you please do the following for me?

          (1) Could you please tell me exactly there the session is looping, which method and which loop?
          (2) After that could you please apply the following patch and let me know if that makes any difference?

          Oleg


          diff --git a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java
          index bf3eb62..0c10041 100644
          — a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java
          +++ b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java
          @@ -156,10 +156,6 @@ public class DefaultNHttpClientConnection
          }

          public void consumeInput(final NHttpClientEventHandler handler) {

          • if (this.status != ACTIVE) { - this.session.clearEvent(EventMask.READ); - return; - }
            try {
            if (this.response == null) { int bytesRead; diff --git a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java index 28b4046..ffb834d 100644 --- a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java +++ b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java @@ -162,10 +162,6 @@ public class DefaultNHttpServerConnection }

            public void consumeInput(final NHttpServerEventHandler handler) {
            - if (this.status != ACTIVE) { - this.session.clearEvent(EventMask.READ); - return; - }

            try {
            if (this.request == null) {
            int bytesRead;

          Show
          Oleg Kalnichevski added a comment - I think I have a theory what might be happening here. I think the cause of the problem is a bug in the HTTP connection state management logic triggered by premature termination of the connection but some things just do not seem to add up. Most importantly I do not see where the I/O session code can be looping. Another thing is that I do not see why #produceOutput method does not fire, even though the session clearly enables output events in order to flush some data stuck in the output buffer. It appears that the connection is still trying to write out request body while the remote endpoint is clearly no longer expecting any more data. More mystery. If the connection can timeout why on earth are we not seeing any activity on that session that keeps inactivity timer from triggering a timeout event? Could you please do the following for me? (1) Could you please tell me exactly there the session is looping, which method and which loop? (2) After that could you please apply the following patch and let me know if that makes any difference? Oleg — diff --git a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java index bf3eb62..0c10041 100644 — a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java +++ b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java @@ -156,10 +156,6 @@ public class DefaultNHttpClientConnection } public void consumeInput(final NHttpClientEventHandler handler) { if (this.status != ACTIVE) { - this.session.clearEvent(EventMask.READ); - return; - } try { if (this.response == null) { int bytesRead; diff --git a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java index 28b4046..ffb834d 100644 --- a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java +++ b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java @@ -162,10 +162,6 @@ public class DefaultNHttpServerConnection } public void consumeInput(final NHttpServerEventHandler handler) { - if (this.status != ACTIVE) { - this.session.clearEvent(EventMask.READ); - return; - } try { if (this.request == null) { int bytesRead;
          Hide
          Oleg Kalnichevski added a comment -

          I think I have a theory what might be happening here. I think the cause of the problem is a bug in the HTTP connection state management logic triggered by premature termination of the connection but some things just do not seem to add up. Most importantly I do not see where the I/O session code can be looping. Another thing is that I do not see why #produceOutput method does not fire, even though the session clearly enables output events in order to flush some data stuck in the output buffer. It appears that the connection is still trying to write out request body while the remote endpoint is clearly no longer expecting any more data.

          Could you please do the following for me?
          (1) Could you please tell me exactly there the session is looping, which method and which loop?
          (2) After that could you please apply the following patch and let me know if that makes any difference?

          Oleg


          diff --git a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java
          index bf3eb62..0c10041 100644
          — a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java
          +++ b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java
          @@ -156,10 +156,6 @@ public class DefaultNHttpClientConnection
          }

          public void consumeInput(final NHttpClientEventHandler handler) {

          • if (this.status != ACTIVE) { - this.session.clearEvent(EventMask.READ); - return; - }
            try {
            if (this.response == null) { int bytesRead; diff --git a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java index 28b4046..ffb834d 100644 --- a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java +++ b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java @@ -162,10 +162,6 @@ public class DefaultNHttpServerConnection }

            public void consumeInput(final NHttpServerEventHandler handler) {
            - if (this.status != ACTIVE) { - this.session.clearEvent(EventMask.READ); - return; - }

            try {
            if (this.request == null) {
            int bytesRead;

          Show
          Oleg Kalnichevski added a comment - I think I have a theory what might be happening here. I think the cause of the problem is a bug in the HTTP connection state management logic triggered by premature termination of the connection but some things just do not seem to add up. Most importantly I do not see where the I/O session code can be looping. Another thing is that I do not see why #produceOutput method does not fire, even though the session clearly enables output events in order to flush some data stuck in the output buffer. It appears that the connection is still trying to write out request body while the remote endpoint is clearly no longer expecting any more data. Could you please do the following for me? (1) Could you please tell me exactly there the session is looping, which method and which loop? (2) After that could you please apply the following patch and let me know if that makes any difference? Oleg — diff --git a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java index bf3eb62..0c10041 100644 — a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java +++ b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpClientConnection.java @@ -156,10 +156,6 @@ public class DefaultNHttpClientConnection } public void consumeInput(final NHttpClientEventHandler handler) { if (this.status != ACTIVE) { - this.session.clearEvent(EventMask.READ); - return; - } try { if (this.response == null) { int bytesRead; diff --git a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java index 28b4046..ffb834d 100644 --- a/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java +++ b/httpcore-nio/src/main/java/org/apache/http/impl/nio/DefaultNHttpServerConnection.java @@ -162,10 +162,6 @@ public class DefaultNHttpServerConnection } public void consumeInput(final NHttpServerEventHandler handler) { - if (this.status != ACTIVE) { - this.session.clearEvent(EventMask.READ); - return; - } try { if (this.request == null) { int bytesRead;
          Hide
          Scott Stanton added a comment - - edited

          I have reproduced it using the head of the 4.2.x branch from shortly before you updated to 4.2.4-SNAPSHOT. There should be no deltas relative to the 4.2.3 release other than the version number. I'll try rerunning with the official bits, but I don't expect there to be any difference. It looks like I'll still need to build my own copy of httpcore-contrib since that's not in the repository.

          In any case, I've attached the logging class output for the socket in question. I don't have ssl debug output because it is simply too verbose to collect over the timeframe needed to get it to fail. However, looking at the logs, I do see something that looks suspicious. The very first log line is from a different nio thread that claims to be closing http-outgoing-6, followed by the beginning of the handshake. The final message in the log happens a day later after I attached with the debugger to confirm that I was hitting the same line in the busy loop:

          SSLIOSession.java:370

          public synchronized boolean isAppInputReady() throws IOException {
          int bytesRead = receiveEncryptedData();
          if (bytesRead == -1)

          { this.endOfStream = true; }

          After sitting on a breakpoint at that line for long enough for the socket idle timeout, the socket was closed normally when I resumed.

          Show
          Scott Stanton added a comment - - edited I have reproduced it using the head of the 4.2.x branch from shortly before you updated to 4.2.4-SNAPSHOT. There should be no deltas relative to the 4.2.3 release other than the version number. I'll try rerunning with the official bits, but I don't expect there to be any difference. It looks like I'll still need to build my own copy of httpcore-contrib since that's not in the repository. In any case, I've attached the logging class output for the socket in question. I don't have ssl debug output because it is simply too verbose to collect over the timeframe needed to get it to fail. However, looking at the logs, I do see something that looks suspicious. The very first log line is from a different nio thread that claims to be closing http-outgoing-6, followed by the beginning of the handshake. The final message in the log happens a day later after I attached with the debugger to confirm that I was hitting the same line in the busy loop: SSLIOSession.java:370 public synchronized boolean isAppInputReady() throws IOException { int bytesRead = receiveEncryptedData(); if (bytesRead == -1) { this.endOfStream = true; } After sitting on a breakpoint at that line for long enough for the socket idle timeout, the socket was closed normally when I resumed.
          Hide
          Oleg Kalnichevski added a comment -

          Scott,
          Could you please re-test your application with 4.2.3 to be sure the problem is reproducible with the official release of HttpCore (just in case something might have gone wrong when the patch had been applied). 4.2.3 should be available in the Maven central and its mirrors by now.

          Oleg

          Show
          Oleg Kalnichevski added a comment - Scott, Could you please re-test your application with 4.2.3 to be sure the problem is reproducible with the official release of HttpCore (just in case something might have gone wrong when the patch had been applied). 4.2.3 should be available in the Maven central and its mirrors by now. Oleg
          Hide
          Scott Stanton added a comment -

          Unfortunately it looks like this is a different issue as it is still happening even after the patch. We don't have a quick test case, but we seem to be able to get it to happen within a day or two, so I will try to gather the logging output you have requested.

          Show
          Scott Stanton added a comment - Unfortunately it looks like this is a different issue as it is still happening even after the patch. We don't have a quick test case, but we seem to be able to get it to happen within a day or two, so I will try to gather the logging output you have requested.
          Hide
          Oleg Kalnichevski added a comment -

          Scott
          I believe the problem has been resolved with this change [1]. Could you please re-test your application with the latest SVN snapshot off the 4.2.x branch and confirm the fix. I'll start working on the 4.2.3 release as soon as the fix is confirmed.

          Oleg

          [1] http://svn.apache.org/viewvc?view=revision&revision=1410793

          Show
          Oleg Kalnichevski added a comment - Scott I believe the problem has been resolved with this change [1] . Could you please re-test your application with the latest SVN snapshot off the 4.2.x branch and confirm the fix. I'll start working on the 4.2.3 release as soon as the fix is confirmed. Oleg [1] http://svn.apache.org/viewvc?view=revision&revision=1410793
          Hide
          Oleg Kalnichevski added a comment -

          Scott
          If you are able to reproduce the problem more or less easily, could you please try the fix proposed in HTTPCORE-322? You will have to get the latest snapshot from the 4.2.x branch [1], patch it and build a custom version of HttpCore. That would be hugely appreciated.

          Oleg

          [1] http://svn.apache.org/repos/asf/httpcomponents/httpcore/branches/4.2.x/

          Show
          Oleg Kalnichevski added a comment - Scott If you are able to reproduce the problem more or less easily, could you please try the fix proposed in HTTPCORE-322 ? You will have to get the latest snapshot from the 4.2.x branch [1] , patch it and build a custom version of HttpCore. That would be hugely appreciated. Oleg [1] http://svn.apache.org/repos/asf/httpcomponents/httpcore/branches/4.2.x/
          Hide
          Oleg Kalnichevski added a comment -

          Everything but wire. Could you please also turn on SSL debug logs?

          Oleg

          Show
          Oleg Kalnichevski added a comment - Everything but wire. Could you please also turn on SSL debug logs? Oleg
          Hide
          Scott Stanton added a comment -

          Ok, I think I've got it hooked up. Is basic logging enough, or do you need wire+headers, too?

          Show
          Scott Stanton added a comment - Ok, I think I've got it hooked up. Is basic logging enough, or do you need wire+headers, too?
          Hide
          Scott Stanton added a comment -

          I'm not sure how to use the logging classes with SSLNHttpClientConnectionFactory. I see that LoggingIOSession is a decorator around another IOSession, but I don't see how I can merge the factory or connection classes with the SSL classes. Am I missing something?

          Show
          Scott Stanton added a comment - I'm not sure how to use the logging classes with SSLNHttpClientConnectionFactory. I see that LoggingIOSession is a decorator around another IOSession, but I don't see how I can merge the factory or connection classes with the SSL classes. Am I missing something?
          Hide
          Oleg Kalnichevski added a comment -

          Scott
          I am currently at loss as to how to approach this problem. Would it be a big deal for you to add i/o session and context logging using the following classes from the contrib package [1] and try to produce a log exhibiting the problem?

          Oleg

          [1] http://svn.apache.org/repos/asf/httpcomponents/contrib/httpcore/branches/4.2.x/src/main/java/org/apache/http/contrib/logging/

          Show
          Oleg Kalnichevski added a comment - Scott I am currently at loss as to how to approach this problem. Would it be a big deal for you to add i/o session and context logging using the following classes from the contrib package [1] and try to produce a log exhibiting the problem? Oleg [1] http://svn.apache.org/repos/asf/httpcomponents/contrib/httpcore/branches/4.2.x/src/main/java/org/apache/http/contrib/logging/
          Hide
          Scott Stanton added a comment - - edited

          I'm having a hard time reproducing this problem in any small context. It's an infrequent race condition, but it's definitely happening often enough to be a critical problem for me.

          I am using a HttpAsyncRequestExecutor with DefaultHttpClientIODispatch, DefaultConnectingIOReactor, SSLNHttpClientConnectionFactory. I am still trying to get a memory dump from a hung server, but the one time I was able to attach with a debugger, it looked like the stream was at EOF but for some reason it kept registering for interest in readable events.

          The one other detail that might help with repro is that I'm using the BouncyCastle crypto suite as the default security provider. It may behave a little differently from the oracle crypto stack, although I would expect that to be at a lower layer.

          Show
          Scott Stanton added a comment - - edited I'm having a hard time reproducing this problem in any small context. It's an infrequent race condition, but it's definitely happening often enough to be a critical problem for me. I am using a HttpAsyncRequestExecutor with DefaultHttpClientIODispatch, DefaultConnectingIOReactor, SSLNHttpClientConnectionFactory. I am still trying to get a memory dump from a hung server, but the one time I was able to attach with a debugger, it looked like the stream was at EOF but for some reason it kept registering for interest in readable events. The one other detail that might help with repro is that I'm using the BouncyCastle crypto suite as the default security provider. It may behave a little differently from the oracle crypto stack, although I would expect that to be at a lower layer.
          Hide
          Oleg Kalnichevski added a comment -

          Scott
          What I was trying to say was whether or not you were using a custom NHttpClientEventHandler implementation [1] or the one shipped with HttpCore [2]? The default protocol handlers should automatically shut down the underlying connection as soon as the end of stream condition has been detected [3].

          Is there any chance you could share the certificates or some code that I could use to reproduce the problem?

          Oleg

          [1] http://hc.apache.org/httpcomponents-core-ga/httpcore-nio/xref/org/apache/http/nio/NHttpClientEventHandler.html
          [2] http://hc.apache.org/httpcomponents-core-ga/httpcore-nio/xref/org/apache/http/nio/protocol/HttpAsyncRequestExecutor.html
          [3] http://hc.apache.org/httpcomponents-core-ga/httpcore-nio/xref/org/apache/http/nio/protocol/HttpAsyncRequestExecutor.html#256

          Show
          Oleg Kalnichevski added a comment - Scott What I was trying to say was whether or not you were using a custom NHttpClientEventHandler implementation [1] or the one shipped with HttpCore [2] ? The default protocol handlers should automatically shut down the underlying connection as soon as the end of stream condition has been detected [3] . Is there any chance you could share the certificates or some code that I could use to reproduce the problem? Oleg [1] http://hc.apache.org/httpcomponents-core-ga/httpcore-nio/xref/org/apache/http/nio/NHttpClientEventHandler.html [2] http://hc.apache.org/httpcomponents-core-ga/httpcore-nio/xref/org/apache/http/nio/protocol/HttpAsyncRequestExecutor.html [3] http://hc.apache.org/httpcomponents-core-ga/httpcore-nio/xref/org/apache/http/nio/protocol/HttpAsyncRequestExecutor.html#256
          Hide
          Scott Stanton added a comment -

          SSLIOSession.handler is null. I'm using SSLNHttpClientConnectionFactory. I have the BouncyCastle cipher suite at the head of the ciphers list. What else would be relevant here?

          I worked around the problem with the following hack:

          — C:\cygwin\tmp\t11240t143.tmp 2012-10-16 10:22:12.017333200 -0700
          +++ c:\commander\ThirdParty\main\httpcomponents-core\httpcore-nio\src\main\java\org\apache\http\nio\reactor\ssl\SSLIOSession.java 2012-10-15 20:14:58.550000000 -0700
          @@ -366,6 +366,8 @@
          int bytesRead = receiveEncryptedData();
          if (bytesRead == -1)

          { this.endOfStream = true; + close(); + return false; }

          doHandshake();
          decryptData();

          There may be a better way to handle this, but it does appear to resolve the cpu loops when the server rejects a certificate.

          Show
          Scott Stanton added a comment - SSLIOSession.handler is null. I'm using SSLNHttpClientConnectionFactory. I have the BouncyCastle cipher suite at the head of the ciphers list. What else would be relevant here? I worked around the problem with the following hack: — C:\cygwin\tmp\t11240t143.tmp 2012-10-16 10:22:12.017333200 -0700 +++ c:\commander\ThirdParty\main\httpcomponents-core\httpcore-nio\src\main\java\org\apache\http\nio\reactor\ssl\SSLIOSession.java 2012-10-15 20:14:58.550000000 -0700 @@ -366,6 +366,8 @@ int bytesRead = receiveEncryptedData(); if (bytesRead == -1) { this.endOfStream = true; + close(); + return false; } doHandshake(); decryptData(); There may be a better way to handle this, but it does appear to resolve the cpu loops when the server rejects a certificate.
          Hide
          Oleg Kalnichevski added a comment -

          Hmmm. The SSLIOSession instances are not supposed to shut down by themselves. It is the responsibility of the protocol handler to terminate the session properly in case the channel has been closed by the peer on the opposite endpoint. Are using a custom protocol handler?

          Oleg

          Show
          Oleg Kalnichevski added a comment - Hmmm. The SSLIOSession instances are not supposed to shut down by themselves. It is the responsibility of the protocol handler to terminate the session properly in case the channel has been closed by the peer on the opposite endpoint. Are using a custom protocol handler? Oleg

            People

            • Assignee:
              Oleg Kalnichevski
              Reporter:
              Scott Stanton
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development