The included patch is for openssl but its not 100% clear to me if the real bug is in apache or in openssl, fixing it in openssl was easiest. I emailed the bug to the openssl folks also. apache version: 2.0.48-146 openssl version: 0.9.7b-125 OS: SuSE 9.0 SMP/x86_64 Kernel: 2.4.21-260-smp The problem I'm seeing is that apache will not perform a "ProxyPass" to another SSL host. The openssl function ssl_verify_cert_chain() [ssl/ssl_cert.c] stores the SSL* pointer in the X509_STORE_CTX context with the following code: X509_STORE_CTX_set_ex_data(&ctx,SSL_get_ex_data_X509_STORE_CTX_idx(),s); the apache callback function ssl_callback_SSLVerify() [modules/ssl/ssl_kernel_engine.c] then retrieves this value with the following code: SSL *ssl = (SSL *)X509_STORE_CTX_get_app_data(ctx); which is just a macro to retrieve index 0 of the ex_data. This fails on the above system. I don't have an exact match single processor 32-bit machine for comparison testing but I tested on a close match and it works fine. The following patch fixes the problem on the above system: ----------------------- diff -Naur openssl-0.9.7b-orig/ssl/ssl_cert.c openssl-0.9.7b/ssl/ssl_cert.c --- openssl-0.9.7b-orig/ssl/ssl_cert.c 2004-12-03 18:35:40.000000000 -0800 +++ openssl-0.9.7b/ssl/ssl_cert.c 2004-12-03 18:36:20.000000000 -0800 @@ -467,6 +467,7 @@ if (SSL_get_verify_depth(s) >= 0) X509_STORE_CTX_set_depth(&ctx, SSL_get_verify_depth(s)); X509_STORE_CTX_set_ex_data(&ctx,SSL_get_ex_data_X509_STORE_CTX_idx(),s); + X509_STORE_CTX_set_app_data(&ctx,s); /* We need to set the verify purpose. The purpose can be determined by * the context: if its a server it will verify SSL client certificates ----------------------- The bug is that a callback function has no way of retrieving the value returned by SSL_get_ex_data_X509_STORE_CTX_idx(), in apache's case it uses 0 via the X509_STORE_CTX_get_app_data() macro. This may not be the "correct" ultimate fix as I'm not sure if there's a reason why index 0 might not be available. The "ctx" structure above is stack allocated and only used for the duration of the ssl_verify_cert_chain() call.
Could you: 1) describe the failure you see 2) reproduce this with the vanilla 2.0.52 release rather than the SuSE package
The failure is that if I include a ProxyPass statement from one SSL enabled host to another SSL enabled, as soon as I try to access a page that should be proxied from the other host the child process in apache seg faults and I see nothing in my browser. Here's a trimmed generic configuration that will generate the problem: Host 1: ------- <VirtualHost 1.2.3.4:443> ServerName host1.domain.com DocumentRoot /srv/www/host1 SSLEngine on SSLProxyEngine on SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL SSLCertificateFile /etc/apache2/ssl.crt/host1.domain.com.crt SSLCertificateKeyFile /etc/apache2/ssl.key/host1.domain.com.key ProxyPass /test.html https://host2.domain.com.:444/test.html </VirtualHost> <Directory /srv/www/host1> Order allow,deny Allow from all AllowOverride All </Directory> Host 2: ------- Listen 444 <VirtualHost 1.2.3.5:444> ServerName host2.domain.com DocumentRoot /srv/www/host2 SSLEngine on SSLCertificateKeyFile /etc/apache2/ssl.key/host2.domain.com.key SSLCertificateFile /etc/apache2/ssl.crt/host2.domain.com.crt </VirtualHost> <Directory /srv/www/host2> Order allow,deny Allow from all AllowOverride All </Directory> If you browse to https://host1.domain.com/test.html it should be reverse proxied from https://host2.domain.com/test.html but instead the apache process seg faults. I suspect that this is SMP related or perhaps related to the x86_64 architecture but that's only a suspicion. Here's a backtrace from a core dump: #0 0x0000002a97a72486 in CRYPTO_get_ex_data () from /usr/lib64/libcrypto.so.0.9.7 #1 0x0000002a978d766a in SSL_get_ex_data () from /usr/lib64/libssl.so.0.9.7 #2 0x0000002a977acd40 in ssl_callback_SSLVerify () from /usr/lib64/apache2-prefork/mod_ssl.so #3 0x0000002a97aa67c2 in X509_verify_cert () from /usr/lib64/libcrypto.so.0.9.7 #4 0x0000002a978edd0c in ssl_verify_cert_chain () from /usr/lib64/libssl.so.0.9.7 #5 0x0000002a978e32eb in ssl3_get_server_certificate () from /usr/lib64/libssl.so.0.9.7 #6 0x0000002a978e23dc in ssl3_connect () from /usr/lib64/libssl.so.0.9.7 #7 0x0000002a978ec245 in SSL_connect () from /usr/lib64/libssl.so.0.9.7 #8 0x0000002a978e9f10 in ssl23_get_server_hello () from /usr/lib64/libssl.so.0.9.7 #9 0x0000002a978e992c in ssl23_connect () from /usr/lib64/libssl.so.0.9.7 #10 0x0000002a978ec245 in SSL_connect () from /usr/lib64/libssl.so.0.9.7 #11 0x0000002a977aa8dc in ssl_io_filter_connect () from /usr/lib64/apache2-prefork/mod_ssl.so #12 0x0000002a977aaebe in ssl_io_filter_output () from /usr/lib64/apache2-prefork/mod_ssl.so #13 0x0000000000433b6a in ap_pass_brigade () #14 0x0000002a9c255f3b in ap_proxy_http_request () from /usr/lib64/apache2-prefork/mod_proxy_http.so #15 0x0000002a9c25707f in ap_proxy_http_handler () from /usr/lib64/apache2-prefork/mod_proxy_http.so #16 0x0000002a9c14e3ab in proxy_run_scheme_handler () from /usr/lib64/apache2-prefork/mod_proxy.so #17 0x0000002a9c14cf9b in proxy_handler () from /usr/lib64/apache2-prefork/mod_proxy.so #18 0x0000000000427631 in ap_run_handler () #19 0x0000000000427ca9 in ap_invoke_handler () #20 0x0000000000424506 in ap_process_request () #21 0x000000000041fad8 in ap_process_http_connection () #22 0x00000000004316a1 in ap_run_process_connection () #23 0x0000000000431a02 in ap_process_connection () #24 0x0000000000425d22 in child_main () #25 0x0000000000425ee8 in make_child () #26 0x00000000004260b3 in perform_idle_server_maintenance () #27 0x0000000000426621 in ap_mpm_run () #28 0x000000000042cada in main () If I patch openssl as I stated in the original post it fixes the problem. I'll see if I can duplicate the problem with the stock 2.0.52. I have to proceed with caution since this server is running a number of sites with a lot of traffic.
(In reply to comment #2) This is a duplicate of #3 below.
I am unable to reproduce this bug with a stock 2.0.52 apache on the same system with more or less the same configuration (mainly just a change of port numbers). I also am unable to reproduce it with a stock 2.0.48 apache. Furthermore, even after applying all of the patches (to 2.0.48) that SuSE uses to build their RPM and using the same compiler options that they use and I still can't reproduce it. So either the run-time configuration changes somehow affect it or something happens when building the RPM which affects it. Or maybe I'm just going crazy.
To be clear, you were testing the vanilla 2.0.52 and 2.0.48 sources against the *unpatched* version of OpenSSL, not the one you have patched? Could you try changing the first line of ssl_callback_SSLVerify as follows, instead: - SSL *ssl = (SSL *)X509_STORE_CTX_get_app_data(ctx); + SSL *ssl = X509_STORE_CTX_get_ex_data(ctx, + SSL_get_ex_data_X509_STORE_CTX_idx()); I can't really see why that segfault could happen in the first place, though.
(In reply to comment #6) > To be clear, you were testing the vanilla 2.0.52 and 2.0.48 sources against the > *unpatched* version of OpenSSL, not the one you have patched? That is correct, the unpatched OpenSSL. > > Could you try changing the first line of ssl_callback_SSLVerify as follows, instead: > > - SSL *ssl = (SSL *)X509_STORE_CTX_get_app_data(ctx); > + SSL *ssl = X509_STORE_CTX_get_ex_data(ctx, > + SSL_get_ex_data_X509_STORE_CTX_idx()); I'll try it, but see below because there may be bigger problems. > > I can't really see why that segfault could happen in the first place, though. The reason that its happening is that in some cases the openssl code is storing the SSL* pointer at an index of 1 rather than 0 (the X509_STORE_CTX_get_app_data() macro always uses 0). I discovered this by putting a fprintf statement in the function X509_STORE_CTX_get_ex_new_index() to see what values are being returned as indexes in the SSL_get_ex_data_X509_STORE_CTX_idx() function. Again, this only happens on the live apache server not the test one. By looking at the function SSL_get_ex_data_X509_STORE_CTX_idx() one would presume that this would be impossible. For reference here's the function: int SSL_get_ex_data_X509_STORE_CTX_idx(void) { static volatile int ssl_x509_store_ctx_idx= -1; if (ssl_x509_store_ctx_idx < 0) { /* any write lock will do; usually this branch * will only be taken once anyway */ CRYPTO_w_lock(CRYPTO_LOCK_SSL_CTX); if (ssl_x509_store_ctx_idx < 0) { ssl_x509_store_ctx_idx=X509_STORE_CTX_get_ex_new_index( 0,"SSL for verify callback",NULL,NULL,NULL); } CRYPTO_w_unlock(CRYPTO_LOCK_SSL_CTX); } return ssl_x509_store_ctx_idx; } Also for reference, here is a dump of the assembler for this code: Dump of assembler code for function SSL_get_ex_data_X509_STORE_CTX_idx: 0x0000000000024710 <SSL_get_ex_data_X509_STORE_CTX_idx+0>: sub $0x8,%rsp 0x0000000000024714 <SSL_get_ex_data_X509_STORE_CTX_idx+4>: mov 1094190(%rip),%eax # 0x12f948 <ssl_x509_store_ctx_idx.0> 0x000000000002471a <SSL_get_ex_data_X509_STORE_CTX_idx+10>: test %eax,%eax 0x000000000002471c <SSL_get_ex_data_X509_STORE_CTX_idx+12>: js 0x24730 <SSL_get_ex_data_X509_STORE_CTX_idx+32> 0x000000000002471e <SSL_get_ex_data_X509_STORE_CTX_idx+14>: mov 1094180(%rip),%eax # 0x12f948 <ssl_x509_store_ctx_idx.0> 0x0000000000024724 <SSL_get_ex_data_X509_STORE_CTX_idx+20>: add $0x8,%rsp 0x0000000000024728 <SSL_get_ex_data_X509_STORE_CTX_idx+24>: retq 0x0000000000024729 <SSL_get_ex_data_X509_STORE_CTX_idx+25>: data16 0x000000000002472a <SSL_get_ex_data_X509_STORE_CTX_idx+26>: data16 0x000000000002472b <SSL_get_ex_data_X509_STORE_CTX_idx+27>: data16 0x000000000002472c <SSL_get_ex_data_X509_STORE_CTX_idx+28>: nop 0x000000000002472d <SSL_get_ex_data_X509_STORE_CTX_idx+29>: data16 0x000000000002472e <SSL_get_ex_data_X509_STORE_CTX_idx+30>: data16 0x000000000002472f <SSL_get_ex_data_X509_STORE_CTX_idx+31>: nop 0x0000000000024730 <SSL_get_ex_data_X509_STORE_CTX_idx+32>: lea 20449(%rip),%rdx # 0x29718 <empty.0+908> 0x0000000000024737 <SSL_get_ex_data_X509_STORE_CTX_idx+39>: mov $0x8d,%ecx 0x000000000002473c <SSL_get_ex_data_X509_STORE_CTX_idx+44>: mov $0xc,%esi 0x0000000000024741 <SSL_get_ex_data_X509_STORE_CTX_idx+49>: mov $0x9,%edi 0x0000000000024746 <SSL_get_ex_data_X509_STORE_CTX_idx+54>: callq 0xc268 0x000000000002474b <SSL_get_ex_data_X509_STORE_CTX_idx+59>: mov 1094135(%rip),%eax # 0x12f948 <ssl_x509_store_ctx_idx.0> 0x0000000000024751 <SSL_get_ex_data_X509_STORE_CTX_idx+65>: test %eax,%eax 0x0000000000024753 <SSL_get_ex_data_X509_STORE_CTX_idx+67>: jns 0x24770 <SSL_get_ex_data_X509_STORE_CTX_idx+96> 0x0000000000024755 <SSL_get_ex_data_X509_STORE_CTX_idx+69>: lea 20423(%rip),%rsi # 0x29723 <empty.0+919> 0x000000000002475c <SSL_get_ex_data_X509_STORE_CTX_idx+76>: xor %r8d,%r8d 0x000000000002475f <SSL_get_ex_data_X509_STORE_CTX_idx+79>: xor %ecx,%ecx 0x0000000000024761 <SSL_get_ex_data_X509_STORE_CTX_idx+81>: xor %edx,%edx 0x0000000000024763 <SSL_get_ex_data_X509_STORE_CTX_idx+83>: xor %edi,%edi 0x0000000000024765 <SSL_get_ex_data_X509_STORE_CTX_idx+85>: callq 0xc8a8 0x000000000002476a <SSL_get_ex_data_X509_STORE_CTX_idx+90>: mov %eax,1094104(%rip) # 0x12f948 <ssl_x509_store_ctx_idx.0> 0x0000000000024770 <SSL_get_ex_data_X509_STORE_CTX_idx+96>: lea 20385(%rip),%rdx # 0x29718 <empty.0+908> 0x0000000000024777 <SSL_get_ex_data_X509_STORE_CTX_idx+103>: mov $0x95,%ecx 0x000000000002477c <SSL_get_ex_data_X509_STORE_CTX_idx+108>: mov $0xc,%esi 0x0000000000024781 <SSL_get_ex_data_X509_STORE_CTX_idx+113>: mov $0xa,%edi 0x0000000000024786 <SSL_get_ex_data_X509_STORE_CTX_idx+118>: callq 0xc268 0x000000000002478b <SSL_get_ex_data_X509_STORE_CTX_idx+123>: mov 1094071(%rip),%eax # 0x12f948 <ssl_x509_store_ctx_idx.0> 0x0000000000024791 <SSL_get_ex_data_X509_STORE_CTX_idx+129>: add $0x8,%rsp 0x0000000000024795 <SSL_get_ex_data_X509_STORE_CTX_idx+133>: retq The call to CRYPTO_w_lock() should ensure that ssl_x509_store_ctx_idx can only take on a value of zero. The assembler looks correct to me. It looks like a thread synchronization problem, but its hard to believe that thread synchronization is broken. Remember that this is an SMP box. I'm thinking that the reason I can't reproduce the problem is because the test server is not as heavily loaded as the live server. Also note that this is running the prefork mpm module.
(In reply to comment #7) I was just reading how the prefork module works and it doesn't even have threads so now I'm more confused.
(In reply to comment #7) Also note that the only call to X509_STORE_CTX_get_ex_new_index() in apache and openssl is from the function SSL_get_ex_data_X509_STORE_CTX_idx().
(In reply to comment #6) > Could you try changing the first line of ssl_callback_SSLVerify as follows, instead: > > - SSL *ssl = (SSL *)X509_STORE_CTX_get_app_data(ctx); > + SSL *ssl = X509_STORE_CTX_get_ex_data(ctx, > + SSL_get_ex_data_X509_STORE_CTX_idx()); This is what I was thinking should be the fix should be, which is what I was driving at in my first post: >> The bug is that a callback function has no way of retrieving >> the value returned by SSL_get_ex_data_X509_STORE_CTX_idx(), >> in apache's case it uses 0 via the X509_STORE_CTX_get_app_data() macro. Although I was thinking that SSL_get_ex_data_X509_STORE_CTX_idx() wasn't exported from the library and therefore was not callable so I got started down other paths. Although I can't see where else X509_STORE_CTX_get_ex_new_index() is being called from, but maybe I'm not seeing the big picture. I'm attempting to rebuild the apache RPM now...
Good analysis, thanks. This could well be one of the insane cases which occurs where libssl.so gets loaded and unloaded during startup but libcrypto.so always stays mapped. Global variables in libcrypto.so hence don't get reset to their initialization state, but those in libssl.so do: note that X509_STORE_CTX_get_ex_new_index is probably just incrementing some global variable behind the scenes, no doubt (haven't verified that): so if ssl_x509_store_ctx_idx gets reset to -1, but that global variable does not, then the _idx variable will quite likely get set to "1" next time round. That might also explain the crash. You could try some fprintf debugging in both libcrypto and libssl to try and verify this; or LD_DEBUG stuff to see when each is getting loaded and unloaded.
(In reply to comment #11) > This could well be one of the insane cases which occurs where libssl.so gets > loaded and unloaded during startup but libcrypto.so always stays mapped. Global > variables in libcrypto.so hence don't get reset to their initialization state, > but those in libssl.so do: > Yep, you guessed it. I put some printfs in libssl and libcrypto: 1 29336: 2 29336 644.580855: in crypto_init, ppid: 29335, count: 1 3 29336 644.580921: in ssl_init, ppid: 29335, count: 1 4 29336 645.198972: CRYPTO_get_ex_new_index, ix: 0, ppid: 29335, count2: 1 5 29336 645.198980: /usr/lib64/libcrypto.so.0.9.7(my_dumper+0x2e) [0x2a97aac149] 6 29336 645.198985: /usr/lib64/libcrypto.so.0.9.7(X509_STORE_CTX_get_ex_new_index+0x2b) [0x2a97aac25b] 7 29336 645.198989: /usr/lib64/libssl.so.0.9.7(SSL_get_ex_data_X509_STORE_CTX_idx+0x50) [0x2a978ee580] 8 29336 645.198993: /usr/lib64/libssl.so.0.9.7(SSL_CTX_new+0x1a) [0x2a978ed69a] 9 29336 645.198997: /usr/lib64/apache2-prefork/mod_ssl.so [0x2a977a80fd] 10 29336 645.202025: in ssl_exit, ppid: 29335, count: 2 11 29336 645.209564: in ssl_init, ppid: 29335, count: 1 12 29336 645.608884: in ssl_exit, ppid: 29335, count: 2 13 29336 645.609069: in crypto_exit, ppid: 29335, count: 2 14 29337: 15 29336 644.580855: in crypto_init, ppid: 29335, count: 1 16 29336 645.198972: CRYPTO_get_ex_new_index, ix: 0, ppid: 29335, count2: 1 17 29336 645.198980: /usr/lib64/libcrypto.so.0.9.7(my_dumper+0x2e) [0x2a97aac149] 18 29336 645.198985: /usr/lib64/libcrypto.so.0.9.7(X509_STORE_CTX_get_ex_new_index+0x2b) [0x2a97aac25b] 19 29336 645.198989: /usr/lib64/libssl.so.0.9.7(SSL_get_ex_data_X509_STORE_CTX_idx+0x50) [0x2a978ee580] 20 29336 645.198993: /usr/lib64/libssl.so.0.9.7(SSL_CTX_new+0x1a) [0x2a978ed69a] 21 29336 645.198997: /usr/lib64/apache2-prefork/mod_ssl.so [0x2a977a80fd] 22 29336 645.209564: in ssl_init, ppid: 29335, count: 1 23 29337 645.699132: CRYPTO_get_ex_new_index, ix: 1, ppid: 1, count2: 2 24 29337 645.699147: /usr/lib64/libcrypto.so.0.9.7(my_dumper+0x2e) [0x2a97aac149] 25 29337 645.699152: /usr/lib64/libcrypto.so.0.9.7(X509_STORE_CTX_get_ex_new_index+0x2b) [0x2a97aac25b] 26 29337 645.699156: /usr/lib64/libssl.so.0.9.7(SSL_get_ex_data_X509_STORE_CTX_idx+0x50) [0x2a978ee580] 27 29337 645.699161: /usr/lib64/libssl.so.0.9.7(SSL_CTX_new+0x1a) [0x2a978ed69a] 28 29337 645.699164: /usr/lib64/apache2-prefork/mod_ssl.so [0x2a977a80fd] 29 29337 656.534013: in ssl_exit, ppid: 1, count: 2 30 29337 656.536308: in crypto_exit, ppid: 1, count: 2 The first column is line numbers, the second is process id, the third is time (fractional part is microseconds). Lines 2-13 are from process id 29336: Line 2: libcrypto.so gets loaded and initialized (this output is coming from a __attribute__((constructor)) function that I added). Line 3: libssl.so gets loaded and initialized (output also from a __attribute__((constructor)) function) Line 4: CRYPTO_get_new_index gets called and returns 0 (the ix value) Lines 5-9: traceback of the call into mod_ssl Line 10: libssl.so gets unloaded (output coming from a __attribute__((destructor)) function) Line 11: libssl.so gets reloaded and reinitialized Line 12: libssl.so gets unloaded Line 13: libcrypto.so gets unloaded (output coming from a __attribute__((destructor)) function) Lines 15-30 are from process id 29337: Lines 15-22: match lines 2-9 and line 11 in process 29336, so they were forked from the same point after line 11 (22). line 10 isn't matched in 29337 because line 10 was lost when libssl.so was unloaded. Line 23: CRYPTO_get_new_index gets called and returns 1 (the ix value) rather than 0 because libcrypto.so was not unloaded and reinitialized but libssl.so was. Line 24-30: backtrace and libraries getting unloaded The patch you suggested fixed the problem. Here is the patch file: -------------------------------------------------------------- diff -r -u httpd-2.0.48-orig/modules/ssl/ssl_engine_kernel.c httpd-2.0.48/modules/ssl/ssl_engine_kernel.c --- httpd-2.0.48-orig/modules/ssl/ssl_engine_kernel.c 2004-12-05 17:54:42.000000000 -0800 +++ httpd-2.0.48/modules/ssl/ssl_engine_kernel.c 2004-12-05 17:58:36.000000000 -0800 @@ -1205,7 +1205,8 @@ int ssl_callback_SSLVerify(int ok, X509_STORE_CTX *ctx) { /* Get Apache context back through OpenSSL context */ - SSL *ssl = (SSL *)X509_STORE_CTX_get_app_data(ctx); + SSL *ssl = (SSL *)X509_STORE_CTX_get_ex_data(ctx, + SSL_get_ex_data_X509_STORE_CTX_idx()); conn_rec *conn = (conn_rec *)SSL_get_app_data(ssl); server_rec *s = conn->base_server; request_rec *r = (request_rec *)SSL_get_app_data2(ssl); --------------------------------------------------------------
Thanks a lot for your thorough investigation! I'll apply the patch. But I'd not be surprised if there are more bugs like this lurking, abuse of global state is rife in OpenSSL. The safest fix is to ensure that httpd itself is always linked against both libssl and libcrypto, so neither ever gets unloaded at runtime. That actually should be done in all 2.0.x releases, it may be an artefact of the SuSE build process that this breaks.
http://svn.apache.org/viewcvs?view=rev&rev=111241
*** Bug 34846 has been marked as a duplicate of this bug. ***