Summary: | ssl renegotiation hangs with long ca list | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Lassi Tuura <lat> |
Component: | mod_ssl | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED FIXED | ||
Severity: | blocker | CC: | David.Smith, keven.boudreau, litmaath, pahuja, steve |
Priority: | P2 | ||
Version: | 2.2.11 | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Windows Server 2003 | ||
Attachments: |
extra debugging for mod_ssl
test case for the bug |
Description
Lassi Tuura
2009-04-01 12:48:16 UTC
This might be an openssl bug. Have you tried if this still occurs with a more recent version of openssl? Thanks, no I haven't tried a more recent openssl yet, will try that later. We also noticed this at the same time , we use the same CA set as Lassi. There is a log of our investigation here. https://savannah.cern.ch/bugs/index.php?48458 including a work around for one case which fails to work in another case. The debian bug. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478470 for a secure exim is the looking very close to the problem and has some explanation. Steve Hi Ruediger, I tried with a newer openssl - Fedora Core 10 has: httpd-2.2.11-2.fc10.x86_64 openssl-0.9.8g-12.fc10.x86_64 mod_ssl-2.2.11-2.fc10.x86_64 and the same thing happens. Perhaps more interesting is that if you use openssl's tiny web server then everything is okay. i.e openssl s_server -accept 8443 -verify 5 \ -cert /etc/grid-security/hostcert.pem \ -key /etc/grid-security/hostkey.pem -debug \ -CApath /etc/grid-security/certificates/ -www to set up a web server and then openssl s_client -connect vtb-generic-70.cern.ch:8443 \ -cert ~/.globus/usercert.pem \ -key ~/.globus/userkey.pem \ -debug -CApath /etc/grid-security/certificates/ << EOF GET / HTTP/1.0 EOF then the debug page comes back. i.e it suggests that while openssl is okay something faulty with the mod_ssl layer? Steve A ticket has been opened in the OpenSSL request tracker: http://rt.openssl.org/Ticket/Display.html?id=1949 account: guest password: guest Created attachment 23831 [details]
test case for the bug
I've attached a test case for the bug, which generates test
certificate to trigger the problem.
The first point is that the certificate of the server has to be
bigger than OpenSSL's buffer size, which is 4kB by default. I have
padded the server certificate with comments.
One can play with this number by using the '--pads 123' option.
The second point is to create enough CA entries that the summary
of ServerHello, Certificate, ServerKeyExchange, CertificateRequest
and ServerHelloDone record sizes adds up over 12kB (4kB of the
buffer in OpenSSL and 8kB for the BIO of mod_ssl).
One can play with this number by using the '--cas 123' option.
Run the script and follow the instructions:
../test-certs
./httpd/server start
./httpd/client # which shall hang
./httpd/server stop
You can clean up/regernerate with different parameters:
../test-certs --cas 70 # works
../test-certs --cas 80 # hangs
../test-certs --cas 110 # hangs
../test-certs --cas 120 # works
I hope it clarifies the problem!
For an additional mod_ssl / openssl data point, I confirm I have verified the problem exists using the just-attached "test case for the bug" (or rather version I tested yesterday) on Mac OS X 10.4.11 with: $ openssl version OpenSSL 0.9.8i 15 Sep 2008 $ xxx/bin/httpd -V Server version: Apache/2.2.10 (Unix) Server built: Dec 17 2008 00:20:21 Server's Module Magic Number: 20051115:18 Server loaded: APR 1.3.3, APR-Util 1.3.4 Compiled using: APR 1.3.3, APR-Util 1.3.4 Architecture: 32-bit Server MPM: Prefork threaded: no forked: yes (variable process count) Server compiled with.... -D APACHE_MPM_DIR="server/mpm/prefork" -D APR_HAS_MMAP -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=128 -D HTTPD_ROOT="xxx" -D SUEXEC_BIN="xxx/bin/suexec" -D DEFAULT_PIDLOG="logs/httpd.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_LOCKFILE="logs/accept.lock" -D DEFAULT_ERRORLOG="logs/error_log" -D AP_TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" Hello, I have been making some investigation of the problem - and wanted to summarise the current understanding of the details - which I'm fairly clear about. I've also included a tentative fix, but as I explain there are some considerations - the wider audience here will probably have some more ideas. (I'll write a similar report on the openssl bug list, as there are also openssl issues raised). Apparently the problem (dialogue stall) arrises because of the two layers of buffering between mod_ssl's filter_out BIO and the wbio buffer BIO (i.e. a BIO_f_buffer()), which openssl adds to the stack of wbio BIOs for most of the handshake process. In particular s3_srvr.c (in openssl) uses this check during the flush state: case SSL3_ST_SW_FLUSH: /* number of bytes to be flushed */ num1=BIO_ctrl(s->wbio,BIO_CTRL_INFO,0,NULL); if (num1 > 0) { s->rwstate=SSL_WRITING; num1=BIO_flush(s->wbio); if (num1 <= 0) { ret= -1; goto end; } s->rwstate=SSL_NOTHING; } The BIO_CTRL_INFO call for the BIO_f_buffer (first in the list of s->wbio stack) only checks the number of buffered outgoing bytes in that BIO; so if the BIO_f_buffer has no bytes remaining to be written, but mod_ssl's filter_out BIO does then BIO_flush is not called - bytes remain in the filter_out BIO and the dialogue stalls. e.g. observed sequence during a renegotiate sequence that stalls: bio_filter_out_write receives 37 bytes (SSL renegotiate ciphers request) 37 bytes added to outctx->buffer: total 37 bytes Flushed from s3_srvr.c: BIO_flush called on bio_filter out. outctx->buffer is 0 again. openssl adds in the BIO_f_buffer BIO to the ssl wbio (via ssl_init_wbio_buffer()) openssl server sends HELLO A: 101 bytes Added to BIO_f_buffer: buffer total = 101 bytes (sending from BIO_f_buffer to filter_out is done in crypto routine buffer_write(), bf_buff.c) openssl sever sends SERVER CERT A: 4325 bytes (buffer_write()) fills BIO_f_buffer with extra 3995 bytes: total = 4096 4096 bytes sent to bio_filter_out_write: outctx->buffer now 4096 bytes BIO_f_buffer: 0 bytes buffered Remainder bytes added to BIO_f_buffer: 330 bytes buffered openssl server sends KEY EXCHANGE A: 565 bytes BIO_f_buffer receives 565 bytes: total 895 bytes openssl server sends CERT REQ A: 7797 bytes BIO_f_buffer adds 3201 to buffer making total of 4096: 4096 bytes sent to bio_filter_out_write: All 8192 bytes flushed. outctx->buffer is 0 again. BIO_f_buffer: 0 bytes buffered Remainder of 4596 sent to bio_filter_out_write(): Added to outctx->buffer for total of 4596 buffed bytes. openssl server passes through SSL3_ST_SW_FLUSH state and checks wbio (i.e. BIO_f_buffer) with BIO_CTRL_INFO: Returns 0 as there is nothing buffered, and so no call BIO_flush on wbio. Thus no flush call to mod_ssl's filter_out BIO. Trial fix - a change in both openssl and mod_ssl: --- s3_srvr.c- 2009-06-17 20:44:54.000000000 +0200 +++ s3_srvr.c 2009-06-17 17:20:33.000000000 +0200 @@ -413,7 +413,7 @@ case SSL3_ST_SW_FLUSH: /* number of bytes to be flushed */ - num1=BIO_ctrl(s->wbio,BIO_CTRL_INFO,0,NULL); + num1=BIO_ctrl(s->wbio,BIO_CTRL_WPENDING,0,NULL); if (num1 > 0) { s->rwstate=SSL_WRITING; --- ssl_engine_io.c- 2009-06-17 20:48:19.000000000 +0200 +++ ssl_engine_io.c 2009-06-17 17:41:07.000000000 +0200 @@ -260,10 +260,10 @@ case BIO_CTRL_SET_CLOSE: bio->shutdown = (int)num; break; - case BIO_CTRL_WPENDING: + case BIO_CTRL_PENDING: ret = 0L; break; - case BIO_CTRL_PENDING: + case BIO_CTRL_WPENDING: ret = (long)(outctx->blen + outctx->length); break; case BIO_CTRL_FLUSH: the change in openssl is to modify the BIO_ctrl function used in the flush check to BIO_CTRL_WPENDING: That call on the BIO_f_buffer returns either number of buffer outgoing bytes (similar to BIO_CTRL_INFO) or if zero calls BIO_CTRL_WPENDING on the next_bio (e.g. mod_ssl's filter_out BIO). In the there is no BIO_f_buffer (e.g. when the SSL renegotiate ciphers request is sent) s3_srvr will directly call BIO_CTRL_WPENDING on the filter_out BIO. Unfortunately it looks like BIO_CTRL_PENDING and BIO_CTRL_WPENDING are reversed in ssl_engine_io.c: so while I've seen the patch overall appears to work without problem it would need to be applied to mod_ssl before or at the same time as openssl. It also has the effect of changing the observed behavior for any application providing their own BIO: WPENDING is called rather than BIO_CTRL_INFO. Possibly the SSL3_ST_SW_FLUSH BIO_ctrl check in s3_srvr.c could be explicitly made up of a BIO_CTRL_INFO call, and then a further WPENDNING call only if zero bytes are returned. Interestingly in the above example the stall would be avoided with only a change in bio_filter_out_write(); to buffer bytes up to 8192, not 8191 bytes as current check does: But this relies on the BIO_f_buffer size being 4096 bytes, and there is still an apparent sequence of writes possible to leave the BIO_f_buffer empty but the filter_out BIO full. David Superb analysis, thanks David. Your proposed change to bio_filter_out_ctrl() makes sense to me. Hello Joe. We have an RT ticket (#1949) open with openssl for the same problem; I've alerted them there is more detail and the test case here. I hope they will consider the openssl/crypto suggestion and give some feedback. I've committed your fix in r787722. It doesn't look like this will make any different to existing OpenSSL implementations - I can't see anywhere significant in ssl/ which is calling BIO_pending, BIO_wpending, or BIO_ctrl with _PENDING/_WPENDING similarly. For reference: http://marc.info/?l=openssl-dev&m=124575128524176&w=2 I've committed a change to have mod_ssl flush pending output unconditionally in r788715, which has been confirmed to fix this issue. Thanks again for your work on this, David. (In reply to comment #12) > For reference: http://marc.info/?l=openssl-dev&m=124575128524176&w=2 > > I've committed a change to have mod_ssl flush pending output unconditionally in > r788715, which has been confirmed to fix this issue. > > Thanks again for your work on this, David. Is there some reason this hasn't been backported to 2.2.X? Does anyone have an update on this issue? We hit this exact issue on both firefox and IE when using SSLVerifyClient on a particular location directive. It is reproducable every time. Shrinking the CA Size to < 200k helped for the most part, but there are still cases where we get the renegotiation error. We are running a slightly customized build of Apache 2.2.15 and OpenSSL 0.9.8.k The issue can be reproduced easily with the binaries on httpd.apache.org with the OpenSSL they ship as well. Basically here is the issue. ca-bundle.crt is 253k with a hundred or so CA's in it (generated from Mozilla certdata.txt) 1. User connects to https://server/logonx509 via IE or Firefox 2. URL is protected using this directive: <location logonx509> SSLOptions +StdEnvVars +ExportCertData SSLRequireSSL SSLVerifyClient require SSLVerifyDepth 10 RequestHeader set SSL_CLIENT_CERT "%{SSL_CLIENT_CERT}e" </location> 3. Firefox connects will prompt for smartcard pin, authenticate then fail on re-negotiation. 4. IE connects will prompt for smartcard pin, authenticate then fail on re-negotiation. Shrinking CA size will greatly help, but not always. Typical error in our apache ssl error logs is: Thu May 13 10:53:49 2010] [debug] ssl_engine_io.c(1893): OpenSSL: I/O error, 5 bytes expected to read on BIO#7d7d480 [mem: 7dd72e8] [Thu May 13 10:53:49 2010] [debug] ssl_engine_kernel.c(1903): OpenSSL: Exit: error in SSLv3 read client certificate A [Thu May 13 10:53:49 2010] [error] [client x.x.x.x] Re-negotiation handshake failed: Not accepted by client!?, referer: https://x.x.x.x/sessionmanager/login.jsp?back=https%3a%2f%2fx.x.x.x%2fem I've reviewed this thread in depth and am not sure it resolves all of the issues. Any help appreciated here. Does anyone have an update on this issue? We hit this exact issue on both firefox and IE when using SSLVerifyClient on a particular location directive. It is reproducable every time. Shrinking the CA Size to < 200k helped for the most part, but there are still cases where we get the renegotiation error. We are running a slightly customized build of Apache 2.2.15 and OpenSSL 0.9.8.k The issue can be reproduced easily with the binaries on httpd.apache.org with the OpenSSL they ship as well. Basically here is the issue. ca-bundle.crt is 253k with a hundred or so CA's in it (generated from Mozilla certdata.txt) 1. User connects to https://server/logonx509 via IE or Firefox 2. URL is protected using this directive: <location logonx509> SSLOptions +StdEnvVars +ExportCertData SSLRequireSSL SSLVerifyClient require SSLVerifyDepth 10 RequestHeader set SSL_CLIENT_CERT "%{SSL_CLIENT_CERT}e" </location> 3. Firefox connects will prompt for smartcard pin, authenticate then fail on re-negotiation. 4. IE connects will prompt for smartcard pin, authenticate then fail on re-negotiation. Shrinking CA size will greatly help, but not always. Typical error in our apache ssl error logs is: Thu May 13 10:53:49 2010] [debug] ssl_engine_io.c(1893): OpenSSL: I/O error, 5 bytes expected to read on BIO#7d7d480 [mem: 7dd72e8] [Thu May 13 10:53:49 2010] [debug] ssl_engine_kernel.c(1903): OpenSSL: Exit: error in SSLv3 read client certificate A [Thu May 13 10:53:49 2010] [error] [client x.x.x.x] Re-negotiation handshake failed: Not accepted by client!?, referer: https://x.x.x.x/sessionmanager/login.jsp?back=https%3a%2f%2fx.x.x.x%2fem I've reviewed this thread in depth and am not sure it resolves all of the issues. Any help appreciated here. One more update Using apache 2.2.15 and openssl 1.0.0 the error we get has a bit more info [Thu May 13 16:51:56 2010] [debug] ssl_engine_kernel.c(1903): OpenSSL: Exit: error in SSLv3 flush data [Thu May 13 16:51:56 2010] [error] [client x.x.x.x] Re-negotiation handshake failed: Not accepted by client!?, referer: https://x.x.x.x/sessionmanager/login.jsp?back=https%3a%2f%2fx.x.x.x%2fconsole%2f [Thu May 13 16:52:03 2010] [debug] ssl_engine_io.c(1893): OpenSSL: I/O error, 5 bytes expected to read on BIO#14d33f8 [mem: 15fe15b] [Thu May 13 16:52:03 2010] [info] [client 2002:10d5:7868::10d5:7868] (OS 10060)A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. : SSL input filter read failed. [Thu May 13 16:52:03 2010] [debug] ssl_engine_kernel.c(1884): OpenSSL: Write: SSL negotiation finished successfully [Thu May 13 16:52:03 2010] [info] [client 2002:10d5:7868::10d5:7868] Connection closed to child 242 with standard shutdown (server qanj318.americas.hpqcorp.net:443) Hi Steve, a workaround for current/older versions of httpd/openssl is to put a lot of _dummy_ CAs on the server. That is what we do for now, since it will be a while before we can rely on patched versions being supplied by the distributions that matter to us. We constructed an rpm with 50 dummy CAs that are sufficient to get us beyond the zone of trouble, as we have about 90 real CAs that we need to support. The dummy-ca-certs-20090630-1.noarch.rpm is available here: http://glitesoft.cern.ch/LCG-CAs/current/RPMS.production/ See comment at the end of this page: http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html The fix for this has been merged to 2.2.x. I forgot: the fix for this specific issue is in 2.2.15. If you are seeing reneg failures with 2.2.15 it is likely to be related to the fixes for CVE-2009-3555. Please contact users@httpd.apache.org in the first instance for help diagnosing the issue. (In reply to comment #19) > I forgot: the fix for this specific issue is in 2.2.15. > > If you are seeing reneg failures with 2.2.15 it is likely to be related to the > fixes for CVE-2009-3555. Please contact users@httpd.apache.org in the first > instance for help diagnosing the issue. I am not able to get around this problem for Apache 2.2.17 (with openssl 0.9.8o) , 2.2.19 (with openssl 0.9.8r) and 2.2.19 (with openssl 1.0.0d) windows server for ios clients. Strangely the Linux server works fine but the windows version shows the same problem with ios clients (I am able to get it working for the desktop browsers, openssl -s_client and ios client with Apache on linux). Could there be some issue on ios client or somehow Apache on windows is not flushing data even thought the openssl is forcing a flush? FWIW, there is also a report in Debian about this issue still affecting 2.2.16: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630888 But I haven't looked into it. (In reply to comment #20) > (In reply to comment #19) > > I forgot: the fix for this specific issue is in 2.2.15. > > > > If you are seeing reneg failures with 2.2.15 it is likely to be related to the > > fixes for CVE-2009-3555. Please contact users@httpd.apache.org in the first > > instance for help diagnosing the issue. > I am not able to get around this problem for Apache 2.2.17 (with openssl > 0.9.8o) , 2.2.19 (with openssl 0.9.8r) and 2.2.19 (with openssl 1.0.0d) windows > server for ios clients. Strangely the Linux server works fine but the windows > version shows the same problem with ios clients (I am able to get it working > for the desktop browsers, openssl -s_client and ios client with Apache on > linux). Could there be some issue on ios client or somehow Apache on windows is > not flushing data even thought the openssl is forcing a flush? Hello, I also have that problem under Windows Server 2003 (haven't been able to test it on a different version/OS). I am currently with 2.2.17 (win32) and OpenSSL 0.9.8o and I will also test with latest stable versions. Your talking about data flush? Is it something that can be done manually to restore temporarly the services? I saw there was a patch release but doesn't seem to resolve the issue. This patch is included in 2.2.17 or higher? FYI, here's the error I have in the apache logs. --- [Fri Jun 03 13:46:45 2011] [info] Subsequent (No.100) HTTPS request received for child 970 (server genesys:443) [Fri Jun 03 13:46:45 2011] [debug] ssl_engine_io.c(1708): OpenSSL: I/O error, 5 bytes expected to read on BIO#108da8d8 [mem: 10514300] [Fri Jun 03 13:46:45 2011] [info] (OS 10060)A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. : SSL input filter read failed. [Fri Jun 03 13:46:45 2011] [debug] ssl_engine_kernel.c(1749): OpenSSL: Write: SSL negotiation finished successfully [Fri Jun 03 13:46:45 2011] [info] Connection to child 956 closed with standard shutdown(server genesys:443, client 172.26.69.60) --- Thank you. Try using a fully qualified domain name as CN of the server certificate. --Puneet (In reply to comment #22) > (In reply to comment #20) > > (In reply to comment #19) > > > I forgot: the fix for this specific issue is in 2.2.15. > > > > > > If you are seeing reneg failures with 2.2.15 it is likely to be related to the > > > fixes for CVE-2009-3555. Please contact users@httpd.apache.org in the first > > > instance for help diagnosing the issue. > > I am not able to get around this problem for Apache 2.2.17 (with openssl > > 0.9.8o) , 2.2.19 (with openssl 0.9.8r) and 2.2.19 (with openssl 1.0.0d) windows > > server for ios clients. Strangely the Linux server works fine but the windows > > version shows the same problem with ios clients (I am able to get it working > > for the desktop browsers, openssl -s_client and ios client with Apache on > > linux). Could there be some issue on ios client or somehow Apache on windows is > > not flushing data even thought the openssl is forcing a flush? > > Hello, > > I also have that problem under Windows Server 2003 (haven't been able to test > it on a different version/OS). I am currently with 2.2.17 (win32) and OpenSSL > 0.9.8o and I will also test with latest stable versions. > Your talking about data flush? Is it something that can be done manually to > restore temporarly the services? I saw there was a patch release but doesn't > seem to resolve the issue. This patch is included in 2.2.17 or higher? > > FYI, here's the error I have in the apache logs. > --- > [Fri Jun 03 13:46:45 2011] [info] Subsequent (No.100) HTTPS request received > for child 970 (server genesys:443) > [Fri Jun 03 13:46:45 2011] [debug] ssl_engine_io.c(1708): OpenSSL: I/O error, 5 > bytes expected to read on BIO#108da8d8 [mem: 10514300] > [Fri Jun 03 13:46:45 2011] [info] (OS 10060)A connection attempt failed because > the connected party did not properly respond after a period of time, or > established connection failed because connected host has failed to respond. : > SSL input filter read failed. > [Fri Jun 03 13:46:45 2011] [debug] ssl_engine_kernel.c(1749): OpenSSL: Write: > SSL negotiation finished successfully > [Fri Jun 03 13:46:45 2011] [info] Connection to child 956 closed with standard > shutdown(server genesys:443, client 172.26.69.60) > --- > > Thank you. There is not enough data in comments since this bug was reopened to be able to diagnose any new bug. I suggest those seeing problems with mod_ssl first try to resolve these via the users@ mailing list: http://httpd.apache.org/lists.html#http-users If you can diagnose a further bug with an httpd "hang" during an SSL handshake you need to provide at least: a) debug-level log output where the timestamps clearly show a hang, and/o b) network traces and strace output, along with command-line client output. c) openssl and httpd version |