Bug 32529 - ProxyPass segmentation fault on SMP x86_64
Summary: ProxyPass segmentation fault on SMP x86_64
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_ssl (show other bugs)
Version: 2.0.48
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: PatchAvailable
: 34846 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-12-04 05:31 UTC by Mitch Frazier
Modified: 2005-05-10 14:47 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mitch Frazier 2004-12-04 05:31:08 UTC
The included patch is for openssl but its not 100% clear to me if the real bug
is in apache or in openssl, fixing it in openssl was easiest.  I emailed the bug
to the openssl folks also.

apache version:  2.0.48-146
openssl version: 0.9.7b-125
OS:              SuSE 9.0 SMP/x86_64
Kernel:          2.4.21-260-smp

The problem I'm seeing is that apache will not perform a "ProxyPass" to another
SSL host.  The openssl function ssl_verify_cert_chain() [ssl/ssl_cert.c] stores
the SSL* pointer in the X509_STORE_CTX context with the following code:

  X509_STORE_CTX_set_ex_data(&ctx,SSL_get_ex_data_X509_STORE_CTX_idx(),s);

the apache callback function ssl_callback_SSLVerify()
[modules/ssl/ssl_kernel_engine.c] then retrieves this value with the following code:

  SSL *ssl = (SSL *)X509_STORE_CTX_get_app_data(ctx);

which is just a macro to retrieve index 0 of the ex_data.  This fails on the
above system.  I don't have an exact match single processor 32-bit machine for
comparison testing but I tested on a close match and it works fine.  The
following patch fixes the problem on the above system:

-----------------------
diff -Naur openssl-0.9.7b-orig/ssl/ssl_cert.c openssl-0.9.7b/ssl/ssl_cert.c
--- openssl-0.9.7b-orig/ssl/ssl_cert.c    2004-12-03 18:35:40.000000000 -0800
+++ openssl-0.9.7b/ssl/ssl_cert.c    2004-12-03 18:36:20.000000000 -0800
@@ -467,6 +467,7 @@
     if (SSL_get_verify_depth(s) >= 0)
         X509_STORE_CTX_set_depth(&ctx, SSL_get_verify_depth(s));
     X509_STORE_CTX_set_ex_data(&ctx,SSL_get_ex_data_X509_STORE_CTX_idx(),s);
+    X509_STORE_CTX_set_app_data(&ctx,s);

     /* We need to set the verify purpose. The purpose can be determined by
      * the context: if its a server it will verify SSL client certificates
-----------------------

The bug is that a callback function has no way of retrieving the value returned
by SSL_get_ex_data_X509_STORE_CTX_idx(), in apache's case it uses 0 via the
X509_STORE_CTX_get_app_data() macro.

This may not be the "correct" ultimate fix as I'm not sure if there's a reason
why index 0 might not be available.  The "ctx" structure above is stack
allocated and only used for the duration of the ssl_verify_cert_chain() call.
Comment 1 Joe Orton 2004-12-04 10:17:23 UTC
Could you:

1) describe the failure you see
2) reproduce this with the vanilla 2.0.52 release rather than the SuSE package
Comment 2 Mitch Frazier 2004-12-04 17:37:01 UTC
The failure is that if I include a ProxyPass statement from one SSL enabled host
to another SSL enabled, as soon as I try to access a page that should be proxied
from the other host the child process in apache seg faults and I see nothing in
my browser.  Here's a trimmed generic configuration that will generate the problem:

Host 1:
-------
    <VirtualHost 1.2.3.4:443>
        ServerName      host1.domain.com

        DocumentRoot    /srv/www/host1

        SSLEngine                       on
        SSLProxyEngine                  on
        SSLCipherSuite                 
ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
        SSLCertificateFile              /etc/apache2/ssl.crt/host1.domain.com.crt
        SSLCertificateKeyFile           /etc/apache2/ssl.key/host1.domain.com.key

        ProxyPass      /test.html       https://host2.domain.com.:444/test.html
    </VirtualHost>
    <Directory /srv/www/host1>
        Order allow,deny
        Allow from all
        AllowOverride All
    </Directory>

Host 2:
-------
    Listen 444

    <VirtualHost 1.2.3.5:444>
        ServerName      host2.domain.com

        DocumentRoot    /srv/www/host2

        SSLEngine                       on
        SSLCertificateKeyFile           /etc/apache2/ssl.key/host2.domain.com.key
        SSLCertificateFile              /etc/apache2/ssl.crt/host2.domain.com.crt
    </VirtualHost>

    <Directory /srv/www/host2>
        Order allow,deny
        Allow from all
        AllowOverride All
   </Directory>

If you browse to https://host1.domain.com/test.html it should be reverse proxied
from https://host2.domain.com/test.html but instead the apache process seg
faults.  I suspect that this is SMP related or perhaps related to the x86_64
architecture but that's only a suspicion.


Here's a backtrace from a core dump:

#0  0x0000002a97a72486 in CRYPTO_get_ex_data () from /usr/lib64/libcrypto.so.0.9.7
#1  0x0000002a978d766a in SSL_get_ex_data () from /usr/lib64/libssl.so.0.9.7
#2  0x0000002a977acd40 in ssl_callback_SSLVerify () from
/usr/lib64/apache2-prefork/mod_ssl.so
#3  0x0000002a97aa67c2 in X509_verify_cert () from /usr/lib64/libcrypto.so.0.9.7
#4  0x0000002a978edd0c in ssl_verify_cert_chain () from /usr/lib64/libssl.so.0.9.7
#5  0x0000002a978e32eb in ssl3_get_server_certificate () from
/usr/lib64/libssl.so.0.9.7
#6  0x0000002a978e23dc in ssl3_connect () from /usr/lib64/libssl.so.0.9.7
#7  0x0000002a978ec245 in SSL_connect () from /usr/lib64/libssl.so.0.9.7
#8  0x0000002a978e9f10 in ssl23_get_server_hello () from /usr/lib64/libssl.so.0.9.7
#9  0x0000002a978e992c in ssl23_connect () from /usr/lib64/libssl.so.0.9.7
#10 0x0000002a978ec245 in SSL_connect () from /usr/lib64/libssl.so.0.9.7
#11 0x0000002a977aa8dc in ssl_io_filter_connect () from
/usr/lib64/apache2-prefork/mod_ssl.so
#12 0x0000002a977aaebe in ssl_io_filter_output () from
/usr/lib64/apache2-prefork/mod_ssl.so
#13 0x0000000000433b6a in ap_pass_brigade ()
#14 0x0000002a9c255f3b in ap_proxy_http_request () from
/usr/lib64/apache2-prefork/mod_proxy_http.so
#15 0x0000002a9c25707f in ap_proxy_http_handler () from
/usr/lib64/apache2-prefork/mod_proxy_http.so
#16 0x0000002a9c14e3ab in proxy_run_scheme_handler () from
/usr/lib64/apache2-prefork/mod_proxy.so
#17 0x0000002a9c14cf9b in proxy_handler () from
/usr/lib64/apache2-prefork/mod_proxy.so
#18 0x0000000000427631 in ap_run_handler ()
#19 0x0000000000427ca9 in ap_invoke_handler ()
#20 0x0000000000424506 in ap_process_request ()
#21 0x000000000041fad8 in ap_process_http_connection ()
#22 0x00000000004316a1 in ap_run_process_connection ()
#23 0x0000000000431a02 in ap_process_connection ()
#24 0x0000000000425d22 in child_main ()
#25 0x0000000000425ee8 in make_child ()
#26 0x00000000004260b3 in perform_idle_server_maintenance ()
#27 0x0000000000426621 in ap_mpm_run ()
#28 0x000000000042cada in main ()

If I patch openssl as I stated in the original post it fixes the problem.

I'll see if I can duplicate the problem with the stock 2.0.52.  I have to
proceed with caution since this server is running a number of sites with a lot
of traffic.
Comment 3 Mitch Frazier 2004-12-04 17:37:34 UTC
The failure is that if I include a ProxyPass statement from one SSL enabled host
to another SSL enabled, as soon as I try to access a page that should be proxied
from the other host the child process in apache seg faults and I see nothing in
my browser.  Here's a trimmed generic configuration that will generate the problem:

Host 1:
-------
    <VirtualHost 1.2.3.4:443>
        ServerName      host1.domain.com

        DocumentRoot    /srv/www/host1

        SSLEngine                       on
        SSLProxyEngine                  on
        SSLCipherSuite                 
ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
        SSLCertificateFile              /etc/apache2/ssl.crt/host1.domain.com.crt
        SSLCertificateKeyFile           /etc/apache2/ssl.key/host1.domain.com.key

        ProxyPass      /test.html       https://host2.domain.com.:444/test.html
    </VirtualHost>
    <Directory /srv/www/host1>
        Order allow,deny
        Allow from all
        AllowOverride All
    </Directory>

Host 2:
-------
    Listen 444

    <VirtualHost 1.2.3.5:444>
        ServerName      host2.domain.com

        DocumentRoot    /srv/www/host2

        SSLEngine                       on
        SSLCertificateKeyFile           /etc/apache2/ssl.key/host2.domain.com.key
        SSLCertificateFile              /etc/apache2/ssl.crt/host2.domain.com.crt
    </VirtualHost>

    <Directory /srv/www/host2>
        Order allow,deny
        Allow from all
        AllowOverride All
   </Directory>

If you browse to https://host1.domain.com/test.html it should be reverse proxied
from https://host2.domain.com/test.html but instead the apache process seg
faults.  I suspect that this is SMP related or perhaps related to the x86_64
architecture but that's only a suspicion.


Here's a backtrace from a core dump:

#0  0x0000002a97a72486 in CRYPTO_get_ex_data () from /usr/lib64/libcrypto.so.0.9.7
#1  0x0000002a978d766a in SSL_get_ex_data () from /usr/lib64/libssl.so.0.9.7
#2  0x0000002a977acd40 in ssl_callback_SSLVerify () from
/usr/lib64/apache2-prefork/mod_ssl.so
#3  0x0000002a97aa67c2 in X509_verify_cert () from /usr/lib64/libcrypto.so.0.9.7
#4  0x0000002a978edd0c in ssl_verify_cert_chain () from /usr/lib64/libssl.so.0.9.7
#5  0x0000002a978e32eb in ssl3_get_server_certificate () from
/usr/lib64/libssl.so.0.9.7
#6  0x0000002a978e23dc in ssl3_connect () from /usr/lib64/libssl.so.0.9.7
#7  0x0000002a978ec245 in SSL_connect () from /usr/lib64/libssl.so.0.9.7
#8  0x0000002a978e9f10 in ssl23_get_server_hello () from /usr/lib64/libssl.so.0.9.7
#9  0x0000002a978e992c in ssl23_connect () from /usr/lib64/libssl.so.0.9.7
#10 0x0000002a978ec245 in SSL_connect () from /usr/lib64/libssl.so.0.9.7
#11 0x0000002a977aa8dc in ssl_io_filter_connect () from
/usr/lib64/apache2-prefork/mod_ssl.so
#12 0x0000002a977aaebe in ssl_io_filter_output () from
/usr/lib64/apache2-prefork/mod_ssl.so
#13 0x0000000000433b6a in ap_pass_brigade ()
#14 0x0000002a9c255f3b in ap_proxy_http_request () from
/usr/lib64/apache2-prefork/mod_proxy_http.so
#15 0x0000002a9c25707f in ap_proxy_http_handler () from
/usr/lib64/apache2-prefork/mod_proxy_http.so
#16 0x0000002a9c14e3ab in proxy_run_scheme_handler () from
/usr/lib64/apache2-prefork/mod_proxy.so
#17 0x0000002a9c14cf9b in proxy_handler () from
/usr/lib64/apache2-prefork/mod_proxy.so
#18 0x0000000000427631 in ap_run_handler ()
#19 0x0000000000427ca9 in ap_invoke_handler ()
#20 0x0000000000424506 in ap_process_request ()
#21 0x000000000041fad8 in ap_process_http_connection ()
#22 0x00000000004316a1 in ap_run_process_connection ()
#23 0x0000000000431a02 in ap_process_connection ()
#24 0x0000000000425d22 in child_main ()
#25 0x0000000000425ee8 in make_child ()
#26 0x00000000004260b3 in perform_idle_server_maintenance ()
#27 0x0000000000426621 in ap_mpm_run ()
#28 0x000000000042cada in main ()

If I patch openssl as I stated in the original post it fixes the problem.

I'll see if I can duplicate the problem with the stock 2.0.52.  I have to
proceed with caution since this server is running a number of sites with a lot
of traffic.

Comment 4 Mitch Frazier 2004-12-04 17:41:15 UTC
(In reply to comment #2)
This is a duplicate of #3 below.
Comment 5 Mitch Frazier 2004-12-05 01:16:11 UTC
I am unable to reproduce this bug with a stock 2.0.52 apache on the same system
with more or less the same configuration (mainly just a change of port numbers).
 I also am unable to reproduce it with a stock 2.0.48 apache.

Furthermore, even after applying all of the patches (to 2.0.48) that SuSE uses
to build their RPM and using the same compiler options that they use and I still
can't reproduce it.  So either the run-time configuration changes somehow affect
it or something happens when building the RPM which affects it.  Or maybe I'm
just going crazy.
Comment 6 Joe Orton 2004-12-05 21:57:09 UTC
To be clear, you were testing the vanilla 2.0.52 and 2.0.48 sources against the
*unpatched* version of OpenSSL, not the one you have patched?

Could you try changing the first line of ssl_callback_SSLVerify as follows, instead:

-    SSL *ssl            = (SSL *)X509_STORE_CTX_get_app_data(ctx);
+    SSL *ssl = X509_STORE_CTX_get_ex_data(ctx,
+                                         SSL_get_ex_data_X509_STORE_CTX_idx());

I can't really see why that segfault could happen in the first place, though.
Comment 7 Mitch Frazier 2004-12-06 01:49:41 UTC
(In reply to comment #6)
> To be clear, you were testing the vanilla 2.0.52 and 2.0.48 sources against the
> *unpatched* version of OpenSSL, not the one you have patched?
That is correct, the unpatched OpenSSL.
> 
> Could you try changing the first line of ssl_callback_SSLVerify as follows,
instead:
> 
> -    SSL *ssl            = (SSL *)X509_STORE_CTX_get_app_data(ctx);
> +    SSL *ssl = X509_STORE_CTX_get_ex_data(ctx,
> +                                         SSL_get_ex_data_X509_STORE_CTX_idx());
I'll try it, but see below because there may be bigger problems.

> 
> I can't really see why that segfault could happen in the first place, though.
The reason that its happening is that in some cases the openssl code is storing
the SSL* pointer at an index of 1 rather than 0 (the 
X509_STORE_CTX_get_app_data() macro always uses 0).  I discovered this by
putting a fprintf statement in the function X509_STORE_CTX_get_ex_new_index() to
see what values are being returned as indexes in the
SSL_get_ex_data_X509_STORE_CTX_idx() function.  Again, this only happens on the
live apache server not the test one.

By looking at the function  SSL_get_ex_data_X509_STORE_CTX_idx() one would
presume that this would be impossible.  For reference here's the function:

int SSL_get_ex_data_X509_STORE_CTX_idx(void)
	{
	static volatile int ssl_x509_store_ctx_idx= -1;

	if (ssl_x509_store_ctx_idx < 0)
		{
		/* any write lock will do; usually this branch
		 * will only be taken once anyway */
		CRYPTO_w_lock(CRYPTO_LOCK_SSL_CTX);

		if (ssl_x509_store_ctx_idx < 0)
			{
			ssl_x509_store_ctx_idx=X509_STORE_CTX_get_ex_new_index(
				0,"SSL for verify callback",NULL,NULL,NULL);
			}

		CRYPTO_w_unlock(CRYPTO_LOCK_SSL_CTX);
		}
	return ssl_x509_store_ctx_idx;
	}


Also for reference, here is a dump of the assembler for this code:

Dump of assembler code for function SSL_get_ex_data_X509_STORE_CTX_idx:
0x0000000000024710 <SSL_get_ex_data_X509_STORE_CTX_idx+0>:      sub    $0x8,%rsp
0x0000000000024714 <SSL_get_ex_data_X509_STORE_CTX_idx+4>:      mov   
1094190(%rip),%eax        # 0x12f948 <ssl_x509_store_ctx_idx.0>
0x000000000002471a <SSL_get_ex_data_X509_STORE_CTX_idx+10>:     test   %eax,%eax
0x000000000002471c <SSL_get_ex_data_X509_STORE_CTX_idx+12>:     js     0x24730
<SSL_get_ex_data_X509_STORE_CTX_idx+32>
0x000000000002471e <SSL_get_ex_data_X509_STORE_CTX_idx+14>:     mov   
1094180(%rip),%eax        # 0x12f948 <ssl_x509_store_ctx_idx.0>
0x0000000000024724 <SSL_get_ex_data_X509_STORE_CTX_idx+20>:     add    $0x8,%rsp
0x0000000000024728 <SSL_get_ex_data_X509_STORE_CTX_idx+24>:     retq
0x0000000000024729 <SSL_get_ex_data_X509_STORE_CTX_idx+25>:     data16
0x000000000002472a <SSL_get_ex_data_X509_STORE_CTX_idx+26>:     data16
0x000000000002472b <SSL_get_ex_data_X509_STORE_CTX_idx+27>:     data16
0x000000000002472c <SSL_get_ex_data_X509_STORE_CTX_idx+28>:     nop
0x000000000002472d <SSL_get_ex_data_X509_STORE_CTX_idx+29>:     data16
0x000000000002472e <SSL_get_ex_data_X509_STORE_CTX_idx+30>:     data16
0x000000000002472f <SSL_get_ex_data_X509_STORE_CTX_idx+31>:     nop
0x0000000000024730 <SSL_get_ex_data_X509_STORE_CTX_idx+32>:     lea   
20449(%rip),%rdx        # 0x29718 <empty.0+908>
0x0000000000024737 <SSL_get_ex_data_X509_STORE_CTX_idx+39>:     mov    $0x8d,%ecx
0x000000000002473c <SSL_get_ex_data_X509_STORE_CTX_idx+44>:     mov    $0xc,%esi
0x0000000000024741 <SSL_get_ex_data_X509_STORE_CTX_idx+49>:     mov    $0x9,%edi
0x0000000000024746 <SSL_get_ex_data_X509_STORE_CTX_idx+54>:     callq  0xc268
0x000000000002474b <SSL_get_ex_data_X509_STORE_CTX_idx+59>:     mov   
1094135(%rip),%eax        # 0x12f948 <ssl_x509_store_ctx_idx.0>
0x0000000000024751 <SSL_get_ex_data_X509_STORE_CTX_idx+65>:     test   %eax,%eax
0x0000000000024753 <SSL_get_ex_data_X509_STORE_CTX_idx+67>:     jns    0x24770
<SSL_get_ex_data_X509_STORE_CTX_idx+96>
0x0000000000024755 <SSL_get_ex_data_X509_STORE_CTX_idx+69>:     lea   
20423(%rip),%rsi        # 0x29723 <empty.0+919>
0x000000000002475c <SSL_get_ex_data_X509_STORE_CTX_idx+76>:     xor    %r8d,%r8d
0x000000000002475f <SSL_get_ex_data_X509_STORE_CTX_idx+79>:     xor    %ecx,%ecx
0x0000000000024761 <SSL_get_ex_data_X509_STORE_CTX_idx+81>:     xor    %edx,%edx
0x0000000000024763 <SSL_get_ex_data_X509_STORE_CTX_idx+83>:     xor    %edi,%edi
0x0000000000024765 <SSL_get_ex_data_X509_STORE_CTX_idx+85>:     callq  0xc8a8
0x000000000002476a <SSL_get_ex_data_X509_STORE_CTX_idx+90>:     mov   
%eax,1094104(%rip)        # 0x12f948 <ssl_x509_store_ctx_idx.0>
0x0000000000024770 <SSL_get_ex_data_X509_STORE_CTX_idx+96>:     lea   
20385(%rip),%rdx        # 0x29718 <empty.0+908>
0x0000000000024777 <SSL_get_ex_data_X509_STORE_CTX_idx+103>:    mov    $0x95,%ecx
0x000000000002477c <SSL_get_ex_data_X509_STORE_CTX_idx+108>:    mov    $0xc,%esi
0x0000000000024781 <SSL_get_ex_data_X509_STORE_CTX_idx+113>:    mov    $0xa,%edi
0x0000000000024786 <SSL_get_ex_data_X509_STORE_CTX_idx+118>:    callq  0xc268
0x000000000002478b <SSL_get_ex_data_X509_STORE_CTX_idx+123>:    mov   
1094071(%rip),%eax        # 0x12f948 <ssl_x509_store_ctx_idx.0>
0x0000000000024791 <SSL_get_ex_data_X509_STORE_CTX_idx+129>:    add    $0x8,%rsp
0x0000000000024795 <SSL_get_ex_data_X509_STORE_CTX_idx+133>:    retq

The call to CRYPTO_w_lock() should ensure that ssl_x509_store_ctx_idx can only
take on a value of zero.  The assembler looks correct to me.

It looks like a thread synchronization problem, but its hard to believe that
thread synchronization is broken.  Remember that this is an SMP box.  I'm
thinking that the reason I can't reproduce the problem is because the test
server is not as heavily loaded as the live server.

Also note that this is running the prefork mpm module.

Comment 8 Mitch Frazier 2004-12-06 02:07:42 UTC
(In reply to comment #7)
I was just reading how the prefork module works and it doesn't even have threads
so now I'm more confused.
Comment 9 Mitch Frazier 2004-12-06 02:32:47 UTC
(In reply to comment #7)
Also note that the only call to X509_STORE_CTX_get_ex_new_index() in apache and
openssl is from the function SSL_get_ex_data_X509_STORE_CTX_idx().
Comment 10 Mitch Frazier 2004-12-06 03:17:19 UTC
(In reply to comment #6)
> Could you try changing the first line of ssl_callback_SSLVerify as follows,
instead:
> 
> -    SSL *ssl            = (SSL *)X509_STORE_CTX_get_app_data(ctx);
> +    SSL *ssl = X509_STORE_CTX_get_ex_data(ctx,
> +                                         SSL_get_ex_data_X509_STORE_CTX_idx());

This is what I was thinking should be the fix should be, which is what I was
driving at in my first post:

>> The bug is that a callback function has no way of retrieving
>> the value returned by SSL_get_ex_data_X509_STORE_CTX_idx(),
>> in apache's case it uses 0 via the X509_STORE_CTX_get_app_data() macro.

Although I was thinking that SSL_get_ex_data_X509_STORE_CTX_idx() wasn't
exported from the library and therefore was not callable so I got started down
other paths.

Although I can't see where else X509_STORE_CTX_get_ex_new_index() is being
called from, but maybe I'm not seeing the big picture.

I'm attempting to rebuild the apache RPM now...
Comment 11 Joe Orton 2004-12-06 09:09:28 UTC
Good analysis, thanks.

This could well be one of the insane cases which occurs where libssl.so gets
loaded and unloaded during startup but libcrypto.so always stays mapped.  Global
variables in libcrypto.so hence don't get reset to their initialization state,
but those in libssl.so do: 

note that X509_STORE_CTX_get_ex_new_index is probably just incrementing some
global variable behind the scenes, no doubt (haven't verified that): so if 
ssl_x509_store_ctx_idx gets reset to -1, but that global variable does not, then
the _idx variable will quite likely get set to "1" next time round.

That might also explain the crash.  You could try some fprintf debugging in both
libcrypto and libssl to try and verify this; or LD_DEBUG stuff to see when each
is getting loaded and unloaded.
Comment 12 Mitch Frazier 2004-12-08 04:12:18 UTC
(In reply to comment #11)
> This could well be one of the insane cases which occurs where libssl.so gets
> loaded and unloaded during startup but libcrypto.so always stays mapped.  Global
> variables in libcrypto.so hence don't get reset to their initialization state,
> but those in libssl.so do: 
> 
Yep, you guessed it.  I put some printfs in libssl and libcrypto:

 1  29336:
 2    29336  644.580855: in crypto_init, ppid: 29335, count: 1
 3    29336  644.580921: in ssl_init, ppid: 29335, count: 1
 4    29336  645.198972: CRYPTO_get_ex_new_index, ix: 0, ppid: 29335, count2: 1
 5    29336  645.198980: /usr/lib64/libcrypto.so.0.9.7(my_dumper+0x2e)
[0x2a97aac149]
 6    29336  645.198985:
/usr/lib64/libcrypto.so.0.9.7(X509_STORE_CTX_get_ex_new_index+0x2b) [0x2a97aac25b]
 7    29336  645.198989:
/usr/lib64/libssl.so.0.9.7(SSL_get_ex_data_X509_STORE_CTX_idx+0x50) [0x2a978ee580]
 8    29336  645.198993: /usr/lib64/libssl.so.0.9.7(SSL_CTX_new+0x1a) [0x2a978ed69a]
 9    29336  645.198997: /usr/lib64/apache2-prefork/mod_ssl.so [0x2a977a80fd]
10    29336  645.202025: in ssl_exit, ppid: 29335, count: 2
11    29336  645.209564: in ssl_init, ppid: 29335, count: 1
12    29336  645.608884: in ssl_exit, ppid: 29335, count: 2
13    29336  645.609069: in crypto_exit, ppid: 29335, count: 2
14  29337:
15    29336  644.580855: in crypto_init, ppid: 29335, count: 1
16    29336  645.198972: CRYPTO_get_ex_new_index, ix: 0, ppid: 29335, count2: 1
17    29336  645.198980: /usr/lib64/libcrypto.so.0.9.7(my_dumper+0x2e)
[0x2a97aac149]
18    29336  645.198985:
/usr/lib64/libcrypto.so.0.9.7(X509_STORE_CTX_get_ex_new_index+0x2b) [0x2a97aac25b]
19    29336  645.198989:
/usr/lib64/libssl.so.0.9.7(SSL_get_ex_data_X509_STORE_CTX_idx+0x50) [0x2a978ee580]
20    29336  645.198993: /usr/lib64/libssl.so.0.9.7(SSL_CTX_new+0x1a) [0x2a978ed69a]
21    29336  645.198997: /usr/lib64/apache2-prefork/mod_ssl.so [0x2a977a80fd]
22    29336  645.209564: in ssl_init, ppid: 29335, count: 1
23    29337  645.699132: CRYPTO_get_ex_new_index, ix: 1, ppid: 1, count2: 2
24    29337  645.699147: /usr/lib64/libcrypto.so.0.9.7(my_dumper+0x2e)
[0x2a97aac149]
25    29337  645.699152:
/usr/lib64/libcrypto.so.0.9.7(X509_STORE_CTX_get_ex_new_index+0x2b) [0x2a97aac25b]
26    29337  645.699156:
/usr/lib64/libssl.so.0.9.7(SSL_get_ex_data_X509_STORE_CTX_idx+0x50) [0x2a978ee580]
27    29337  645.699161: /usr/lib64/libssl.so.0.9.7(SSL_CTX_new+0x1a) [0x2a978ed69a]
28    29337  645.699164: /usr/lib64/apache2-prefork/mod_ssl.so [0x2a977a80fd]
29    29337  656.534013: in ssl_exit, ppid: 1, count: 2
30    29337  656.536308: in crypto_exit, ppid: 1, count: 2

The first column is line numbers, the second is process id, the third
is time (fractional part is microseconds).

Lines 2-13 are from process id 29336:
Line 2:      libcrypto.so gets loaded and initialized
             (this output is coming from a __attribute__((constructor))
             function that I added).
Line 3:      libssl.so gets loaded and initialized
             (output also from a __attribute__((constructor)) function)
Line 4:      CRYPTO_get_new_index gets called and returns 0 (the ix value)
Lines 5-9:   traceback of the call into mod_ssl
Line 10:     libssl.so gets unloaded
             (output coming from a __attribute__((destructor)) function)
Line 11:     libssl.so gets reloaded and reinitialized
Line 12:     libssl.so gets unloaded
Line 13:     libcrypto.so gets unloaded
             (output coming from a __attribute__((destructor)) function)

Lines 15-30 are from process id 29337:
Lines 15-22: match lines 2-9 and line 11 in process 29336, so they
             were forked from the same point after line 11 (22).
             line 10 isn't matched in 29337 because line 10 was lost
             when libssl.so was unloaded.
Line 23:     CRYPTO_get_new_index gets called and returns 1 (the ix value)
             rather than 0 because libcrypto.so was not unloaded and
             reinitialized but libssl.so was.
Line 24-30:  backtrace and libraries getting unloaded

The patch you suggested fixed the problem.  Here is the patch file:

--------------------------------------------------------------
diff -r -u httpd-2.0.48-orig/modules/ssl/ssl_engine_kernel.c
httpd-2.0.48/modules/ssl/ssl_engine_kernel.c
--- httpd-2.0.48-orig/modules/ssl/ssl_engine_kernel.c   2004-12-05
17:54:42.000000000 -0800
+++ httpd-2.0.48/modules/ssl/ssl_engine_kernel.c        2004-12-05
17:58:36.000000000 -0800
@@ -1205,7 +1205,8 @@
 int ssl_callback_SSLVerify(int ok, X509_STORE_CTX *ctx)
 {
     /* Get Apache context back through OpenSSL context */
-    SSL *ssl            = (SSL *)X509_STORE_CTX_get_app_data(ctx);
+    SSL *ssl            = (SSL *)X509_STORE_CTX_get_ex_data(ctx,
+                                     SSL_get_ex_data_X509_STORE_CTX_idx());
     conn_rec *conn      = (conn_rec *)SSL_get_app_data(ssl);
     server_rec *s       = conn->base_server;
     request_rec *r      = (request_rec *)SSL_get_app_data2(ssl);
--------------------------------------------------------------

Comment 13 Joe Orton 2004-12-08 10:08:26 UTC
Thanks a lot for your thorough investigation!

I'll apply the patch.  But I'd not be surprised if there are more bugs like this
lurking, abuse of global state is rife in OpenSSL.  The safest fix is to ensure
that httpd itself is always linked against both libssl and libcrypto, so neither
ever gets unloaded at runtime.  That actually should be done in all 2.0.x
releases, it may be an artefact of the SuSE build process that this breaks.
Comment 15 Joe Orton 2005-05-10 22:47:03 UTC
*** Bug 34846 has been marked as a duplicate of this bug. ***