Uploaded image for project: 'HttpComponents HttpCore'
  1. HttpComponents HttpCore
  2. HTTPCORE-322

SSLIOSession infinite loop after alert during handshake

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.3
    • Component/s: HttpCore NIO
    • Labels:
      None

      Description

      I have an application using httpcore-nio that occasionally gets stuck in an infinite loop in SSLIOSession:
      "I/O dispatcher 1" prio=10 tid=0x00002aaab82da000 nid=0x5448 runnable [0x000000004271c000]
      java.lang.Thread.State: RUNNABLE
      at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:767)

      • locked <0x00000000a4f03420> (a sun.security.ssl.SSLEngineImpl)
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:719)
      • locked <0x00000000a51faaa0> (a java.lang.Object)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
        at org.apache.http.nio.reactor.ssl.SSLIOSession.doUnwrap(SSLIOSession.java:228)
        at org.apache.http.nio.reactor.ssl.SSLIOSession.decryptData(SSLIOSession.java:348)
        at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady(SSLIOSession.java:371)
      • locked <0x00000000a4f03498> (a org.apache.http.nio.reactor.ssl.SSLIOSession)
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:118)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:160)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:342)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:320)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:106)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:604)
        at java.lang.Thread.run(Thread.java:636)

      In the above example, I was able to determine that this behavior was caused by a client sending a ClientHello that resumed an established session, immediately followed by a CloseNotify Alert. This sequence only sometimes triggers the infinite loop, so the problem appears to be timing related. I get the feeling that there are other sequences that may trigger this, but I don't have any other concrete examples at the moment.

      1. Test.java
        10 kB
        Paul Donohue

        Issue Links

          Activity

          Hide
          paulsd Paul Donohue added a comment -

          I have confirmed that a ClientHello followed by CloseNotify when resuming a session no longer causes an infinite loop when using this fix. My test program also works properly now.

          Thanks!

          Show
          paulsd Paul Donohue added a comment - I have confirmed that a ClientHello followed by CloseNotify when resuming a session no longer causes an infinite loop when using this fix. My test program also works properly now. Thanks!
          Hide
          olegk Oleg Kalnichevski added a comment -

          Paul

          Fundamentally, the core issue was that the #decryptData() method did not take the handshake status into consideration.

          I applied a variation of your original patch to the 4.2.x branch [1]. Could you please re-test your application with the latest SVN snapshot and confirm that the problem has been resolved?

          Once the fix is confirmed I'll start working on releasing version 4.2.3.

          Oleg

          [1] http://svn.apache.org/viewvc?view=revision&revision=1410793

          Show
          olegk Oleg Kalnichevski added a comment - Paul Fundamentally, the core issue was that the #decryptData() method did not take the handshake status into consideration. I applied a variation of your original patch to the 4.2.x branch [1] . Could you please re-test your application with the latest SVN snapshot and confirm that the problem has been resolved? Once the fix is confirmed I'll start working on releasing version 4.2.3. Oleg [1] http://svn.apache.org/viewvc?view=revision&revision=1410793
          Hide
          paulsd Paul Donohue added a comment -

          BTW, in case anyone else is following along and is wondering why you proposed a slightly different patch than I did - the reason is that my patch would have broken SSL renegotiation, while your patch handles renegotiation properly.

          Note that despite the fact that I was previously wrong about the exact sequence of events leading up to the infinite loop, the committed patch should still fix the problem.

          Show
          paulsd Paul Donohue added a comment - BTW, in case anyone else is following along and is wondering why you proposed a slightly different patch than I did - the reason is that my patch would have broken SSL renegotiation, while your patch handles renegotiation properly. Note that despite the fact that I was previously wrong about the exact sequence of events leading up to the infinite loop, the committed patch should still fix the problem.
          Hide
          paulsd Paul Donohue added a comment -

          Test which emulates having SSLEngine queue up multiple outgoing SSL records for wrapping while incoming SSL records are still buffered for unwrapping.

          Show
          paulsd Paul Donohue added a comment - Test which emulates having SSLEngine queue up multiple outgoing SSL records for wrapping while incoming SSL records are still buffered for unwrapping.
          Hide
          paulsd Paul Donohue added a comment -

          I suddenly had an epiphany ... I was wrong before - timing doesn't actually have anything to do with this issue, so I was only occasionally accidentally generating the proper conditions when I was messing around with timing.

          Here's what actually happened:

          • Connection was established and client sent the ClientHello and CloseNotify. Exact timing doesn't matter except that the CloseNotify must be received before the server processes the ClientHello and queues up the ServerHello, ChangeCipherSpec, and Finished messages.
          • isAppInputReady() called doHandshake(), which got NEED_UNWRAP and called doUnwrap(), which processed the ClientHello (and queued the ServerHello, ChangeCipherSpec, and Finished messages) but left the CloseNotify in the buffer
          • The next loop in doHandshake() got NEED_WRAP, and called doWrap() which processed the ServerHello but left the ChangeCipherSpec and Finished messages in the queue.
          • The next loop in doHandshake() got NEED_WRAP, and called doWrap() again
          • SSLIOSession allocates SSLEngine.getSession().getPacketBufferSize() bytes for outEncrypted and SSLEngine.wrap() returns BUFFER_OVERFLOW if outEncrypted.remaining() < outputRecord.maxRecordSize, which basically means that BUFFER_OVERFLOW is always returned if there is any data in outEncrypted and any SSL records queued for wrapping
          • So, doWrap() returned BUFFER_OVERFLOW and doHandshake() returned
          • isAppInputReady() called decryptData()
          • doUnwrap() calls SSLEngine.unwrap() calls SSLEngineImpl.readNetRecord() calls SSLEngineImpl.getHSStatus(), which returns NEED_WRAP due to the pending messages that need to be written out, which causes doUnwrap() to return with Status.OK
          • decryptData() loops forever calling doUnwrap(), which keeps returning NEED_WRAP

          I previously thought that HTTPCORE-319 was a separate (though similar) issue, but I now think it is probably the same issue. I'm guessing that the server writes out some messages as soon as it detects a bad client certificate and before it finishes processing the client handshake messages, which leads to the exact same infinite loop.

          I'm attaching a test program which fakes out the SSLEngine and network Channel to induce the conditions which led to the infinite loop.

          Show
          paulsd Paul Donohue added a comment - I suddenly had an epiphany ... I was wrong before - timing doesn't actually have anything to do with this issue, so I was only occasionally accidentally generating the proper conditions when I was messing around with timing. Here's what actually happened: Connection was established and client sent the ClientHello and CloseNotify. Exact timing doesn't matter except that the CloseNotify must be received before the server processes the ClientHello and queues up the ServerHello, ChangeCipherSpec, and Finished messages. isAppInputReady() called doHandshake(), which got NEED_UNWRAP and called doUnwrap(), which processed the ClientHello (and queued the ServerHello, ChangeCipherSpec, and Finished messages) but left the CloseNotify in the buffer The next loop in doHandshake() got NEED_WRAP, and called doWrap() which processed the ServerHello but left the ChangeCipherSpec and Finished messages in the queue. The next loop in doHandshake() got NEED_WRAP, and called doWrap() again SSLIOSession allocates SSLEngine.getSession().getPacketBufferSize() bytes for outEncrypted and SSLEngine.wrap() returns BUFFER_OVERFLOW if outEncrypted.remaining() < outputRecord.maxRecordSize, which basically means that BUFFER_OVERFLOW is always returned if there is any data in outEncrypted and any SSL records queued for wrapping So, doWrap() returned BUFFER_OVERFLOW and doHandshake() returned isAppInputReady() called decryptData() doUnwrap() calls SSLEngine.unwrap() calls SSLEngineImpl.readNetRecord() calls SSLEngineImpl.getHSStatus(), which returns NEED_WRAP due to the pending messages that need to be written out, which causes doUnwrap() to return with Status.OK decryptData() loops forever calling doUnwrap(), which keeps returning NEED_WRAP I previously thought that HTTPCORE-319 was a separate (though similar) issue, but I now think it is probably the same issue. I'm guessing that the server writes out some messages as soon as it detects a bad client certificate and before it finishes processing the client handshake messages, which leads to the exact same infinite loop. I'm attaching a test program which fakes out the SSLEngine and network Channel to induce the conditions which led to the infinite loop.
          Hide
          paulsd Paul Donohue added a comment -

          OK, I've added logging. I typically run into this problem only once every couple months, so it might be a LONG time before I have any more info for you.

          Show
          paulsd Paul Donohue added a comment - OK, I've added logging. I typically run into this problem only once every couple months, so it might be a LONG time before I have any more info for you.
          Hide
          olegk Oleg Kalnichevski added a comment -

          Paul
          Could you please try out a slightly different variation of the fix you proposed and let me know if that resolves the problem for you.

          diff --git a/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java b/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java
          index 6c08208..8313c96 100644
          — a/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java
          +++ b/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java
          @@ -368,7 +368,9 @@ public class SSLIOSession implements IOSession, SessionBufferStatus, SocketAcces
          this.endOfStream = true;
          }
          doHandshake();

          • decryptData();
            + if (sslEngine.getHandshakeStatus() == HandshakeStatus.NOT_HANDSHAKING) { + decryptData(); + }

            // Some decrypted data is available or at the end of stream
            return (this.appEventMask & SelectionKey.OP_READ) > 0
            && (this.inPlain.position() > 0

          Show
          olegk Oleg Kalnichevski added a comment - Paul Could you please try out a slightly different variation of the fix you proposed and let me know if that resolves the problem for you. diff --git a/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java b/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java index 6c08208..8313c96 100644 — a/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java +++ b/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java @@ -368,7 +368,9 @@ public class SSLIOSession implements IOSession, SessionBufferStatus, SocketAcces this.endOfStream = true; } doHandshake(); decryptData(); + if (sslEngine.getHandshakeStatus() == HandshakeStatus.NOT_HANDSHAKING) { + decryptData(); + } // Some decrypted data is available or at the end of stream return (this.appEventMask & SelectionKey.OP_READ) > 0 && (this.inPlain.position() > 0
          Hide
          olegk Oleg Kalnichevski added a comment -

          OK. No luck with a reproducer. Let's try logging. Could you please add i/o session and context logging as described in HTTPCORE-319? Feel free to tweak SSLIOSession and add more detailed logs including SSL engine status or wrap / unwrap operation results.

          Oleg

          Show
          olegk Oleg Kalnichevski added a comment - OK. No luck with a reproducer. Let's try logging. Could you please add i/o session and context logging as described in HTTPCORE-319 ? Feel free to tweak SSLIOSession and add more detailed logs including SSL engine status or wrap / unwrap operation results. Oleg
          Hide
          paulsd Paul Donohue added a comment -

          Yeah, I can't think of any way to write a test case that wouldn't be so fabricated as to be pointless. Unfortunately, timing issues and race conditions are a pain.

          Show
          paulsd Paul Donohue added a comment - Yeah, I can't think of any way to write a test case that wouldn't be so fabricated as to be pointless. Unfortunately, timing issues and race conditions are a pain.
          Hide
          garydgregory Gary Gregory added a comment -

          The idea is to be able to create a reproducible @Test test case that is not dependent on timing. But this does seem like a lot of work...

          Gary

          Show
          garydgregory Gary Gregory added a comment - The idea is to be able to create a reproducible @Test test case that is not dependent on timing. But this does seem like a lot of work... Gary
          Hide
          paulsd Paul Donohue added a comment -

          In my case, both sides weren't running in the same JVM - there was a separate client and server. I suppose I could merge them, although I'm not sure that it would provide us with any new information.

          Show
          paulsd Paul Donohue added a comment - In my case, both sides weren't running in the same JVM - there was a separate client and server. I suppose I could merge them, although I'm not sure that it would provide us with any new information.
          Hide
          garydgregory Gary Gregory added a comment -

          If both side are in the same JVM, can they be synchronized through some java.util.concurrent code?

          Show
          garydgregory Gary Gregory added a comment - If both side are in the same JVM, can they be synchronized through some java.util.concurrent code?
          Hide
          paulsd Paul Donohue added a comment -

          I wrote a test application, but since the bug is highly dependent on timing, I wasn't able to reproduce the issue without artificially forcing the timing by adding sleep calls in both the client and server to make sure events happened in the proper order. However, when forced, the sequence I described above seems to occur as I described.

          Show
          paulsd Paul Donohue added a comment - I wrote a test application, but since the bug is highly dependent on timing, I wasn't able to reproduce the issue without artificially forcing the timing by adding sleep calls in both the client and server to make sure events happened in the proper order. However, when forced, the sequence I described above seems to occur as I described.
          Hide
          paulsd Paul Donohue added a comment -

          I'll attempt to write a test to reproduce it...

          Show
          paulsd Paul Donohue added a comment - I'll attempt to write a test to reproduce it...
          Hide
          paulsd Paul Donohue added a comment -

          I believe the following patch would correct this issue.
          diff -Naur httpcomponents-core-4.2.2.orig/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java httpcomponents-core-4.2.2/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java
          — httpcomponents-core-4.2.2.orig/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java 2012-11-14 20:12:09.000000000 -0500
          +++ httpcomponents-core-4.2.2/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java 2012-11-15 12:14:21.000000000 -0500
          @@ -367,8 +367,11 @@
          if (bytesRead == -1)

          { this.endOfStream = true; }
          • doHandshake();
          • decryptData();
            + if(this.sslEngine.getHandshakeStatus() != NOT_HANDSHAKING) { + doHandshake(); + }

            else

            { + decryptData(); + }

            // Some decrypted data is available or at the end of stream
            return (this.appEventMask & SelectionKey.OP_READ) > 0
            && (this.inPlain.position() > 0

          Show
          paulsd Paul Donohue added a comment - I believe the following patch would correct this issue. diff -Naur httpcomponents-core-4.2.2.orig/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java httpcomponents-core-4.2.2/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java — httpcomponents-core-4.2.2.orig/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java 2012-11-14 20:12:09.000000000 -0500 +++ httpcomponents-core-4.2.2/httpcore-nio/src/main/java/org/apache/http/nio/reactor/ssl/SSLIOSession.java 2012-11-15 12:14:21.000000000 -0500 @@ -367,8 +367,11 @@ if (bytesRead == -1) { this.endOfStream = true; } doHandshake(); decryptData(); + if(this.sslEngine.getHandshakeStatus() != NOT_HANDSHAKING) { + doHandshake(); + } else { + decryptData(); + } // Some decrypted data is available or at the end of stream return (this.appEventMask & SelectionKey.OP_READ) > 0 && (this.inPlain.position() > 0
          Hide
          olegk Oleg Kalnichevski added a comment -

          That sounds quite plausible. Would it be possible nonetheless to reproduce the problem with a test application, preferrably self contained?

          Oleg

          Show
          olegk Oleg Kalnichevski added a comment - That sounds quite plausible. Would it be possible nonetheless to reproduce the problem with a test application, preferrably self contained? Oleg
          Hide
          paulsd Paul Donohue added a comment -

          I believe the root of the problem lies in SSLIOSession.isAppInputReady(), which calls:
          doHandshake();
          decryptData();

          In the above case, I believe what happened is:

          • Connection was established, then isAppInputReady() was called before the full ClientHello was received
          • isAppInputReady() called doHandshake(), which got NEED_UNWRAP and called doUnwrap()
          • Since the full ClientHello hadn't been received yet, SSLEngineImpl.readNetRecord() returned Status.BUFFER_UNDERFLOW (which propagated back up and was returned by doUnwrap()), causing doHandshake() to return
          • isAppInputReady() called decryptData()
          • Somewhere after the underflow check in readNetRecord() and before the "this.inEncrypted.position() > 0" check in decryptData(), we finished receiving the ClientHello message and the CloseNotify Alert
          • decryptData() called doUnwrap(), which read the ClientHello and (since it was resuming a session) generated ServerHello, ChangeCipherSpec, and Finished messages, then doUnwrap() returned
          • Due to the unprocessed Alert still in the buffer, decryptData() called doUnwrap() again
          • doUnwrap() calls SSLEngine.unwrap() calls SSLEngineImpl.readNetRecord() calls SSLEngineImpl.getHSStatus(), which returns NEED_WRAP due to the pending messages that need to be written out, which causes doUnwrap() to return with Status.OK
          • decryptData() loops forever calling doUnwrap(), which keeps returning NEED_WRAP
          Show
          paulsd Paul Donohue added a comment - I believe the root of the problem lies in SSLIOSession.isAppInputReady(), which calls: doHandshake(); decryptData(); In the above case, I believe what happened is: Connection was established, then isAppInputReady() was called before the full ClientHello was received isAppInputReady() called doHandshake(), which got NEED_UNWRAP and called doUnwrap() Since the full ClientHello hadn't been received yet, SSLEngineImpl.readNetRecord() returned Status.BUFFER_UNDERFLOW (which propagated back up and was returned by doUnwrap()), causing doHandshake() to return isAppInputReady() called decryptData() Somewhere after the underflow check in readNetRecord() and before the "this.inEncrypted.position() > 0" check in decryptData(), we finished receiving the ClientHello message and the CloseNotify Alert decryptData() called doUnwrap(), which read the ClientHello and (since it was resuming a session) generated ServerHello, ChangeCipherSpec, and Finished messages, then doUnwrap() returned Due to the unprocessed Alert still in the buffer, decryptData() called doUnwrap() again doUnwrap() calls SSLEngine.unwrap() calls SSLEngineImpl.readNetRecord() calls SSLEngineImpl.getHSStatus(), which returns NEED_WRAP due to the pending messages that need to be written out, which causes doUnwrap() to return with Status.OK decryptData() loops forever calling doUnwrap(), which keeps returning NEED_WRAP

            People

            • Assignee:
              Unassigned
              Reporter:
              paulsd Paul Donohue
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development