Bug 46352 - Apache 1.3 crashes inside mod_jk.so when using SetHandler jakarta-servlet and no JkMount
Apache 1.3 crashes inside mod_jk.so when using SetHandler jakarta-servlet and...
Status: RESOLVED FIXED
Product: Tomcat Connectors
Classification: Unclassified
Component: mod_jk
1.2.27
Macintosh Mac OS X 10.4
: P2 normal (vote)
: ---
Assigned To: Tomcat Developers Mailing List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2008-12-05 03:33 UTC by Patrick Middleton
Modified: 2008-12-18 02:57 UTC (History)
0 users



Attachments
sample of debugging code, output, and httpd.conf files (32.42 KB, application/zip)
2008-12-09 04:01 UTC, Patrick Middleton
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Middleton 2008-12-05 03:33:20 UTC
MacOSX 10.4.11 PPC, Tomcat 6.0.18, Tomcat native 1.1.15, stock Apache 1.3.41

When testing mod_jk 1.2.27 (AJP 1.3 connector), works fine with HTTP, crashes with HTTPS.  ModSSL is configured with a selfsigned SSL cert.
Does not crash with mod_jk 1.2.23.
With 1.2.27 and CA-signed cert: not tested.

Crash report excerpt:
Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_INVALID_ADDRESS (0x0001) at 0x262379f2

Thread 0 Crashed:
0   <<00000000>> 	0xffff8b94 __memcpy + 1012 (cpu_capabilities.h:189)
1   mod_jk.so 	0x0160e4e0 init_ws_service + 544 (mod_jk.c:659)
2   mod_jk.so 	0x016107e4 jk_handler + 1096 (mod_jk.c:2200)
3   httpd     	0x0000d91c ap_invoke_handler + 232
4   httpd     	0x000179d8 process_request_internal + 640
5   httpd     	0x00017a58 ap_process_request + 72
6   httpd     	0x000065f4 child_main + 1832
7   httpd     	0x00006858 make_child + 312
8   httpd     	0x00006c18 perform_idle_server_maintenance + 708
9   httpd     	0x000071fc standalone_main + 972
10  httpd     	0x000078a0 main + 1052
11  httpd     	0x00001cc4 _start + 340
12  httpd     	0x00001b6c start + 60

The reference to cpu_capabilities.h suggests that something, most likely httpd, got built against the MacOSX 10.3.9 SDK.  Anything I built (tomcat native, mod_jk, apr) was built universal against the MacOSX 10.4u sdk.
Comment 1 Rainer Jung 2008-12-05 03:57:26 UTC
Line 659 is:

reply_timeout = ap_table_get(r->subprocess_env, "JK_REPLY_TIMEOUT");

So it seems we have trouble retrieving data from the httpd environment variables.
Can you produce a core and use a debugger like gdb to check, what r->subprocess_env is in your case? It should not be null and its type should be table. ap_table_get() is provided by httpd itself.

Did you build mod_jk yourself? Did you use the apxs in configure, that came with your httpd? You can also try adding --enable-EAPI to configure, but I doubt it will help.

The above line is part of a new feature. If you don't want to set reply_timeouts via Apache httpd environment variables, you can comment the line out. I would expect, that you'll then get crashes in other parts of the code, but we might get a better idea what's wrong.
Comment 2 Patrick Middleton 2008-12-05 07:55:49 UTC
Yes, I built mod_jk.so myself -- Apple doesn't supply this with MacOSX, only with MacOSX Server, and unlike mod_ssl the source used is not available via www.opensource.apple.com/darwinsource/ .

The configure command used (mea culpa, should have been in the original submission) was (ad lib http://developer.apple.com/technotes/tn2005/tn2137.html ) 

env CFLAGS="-O -g -isysroot /Developer/SDKs/MacOSX10.4u.sdk -arch i386 -arch ppc" \
  LDFLAGS="-arch i386 -arch ppc" \
  ./configure \
  --disable-dependency-tracking \
  --enable-EAPI \
  --with-apxs=/usr/sbin/apxs



Create a coredump on MacOSX for debugging purposes?  Never seen that done.  Attaching with gdb would work if I could tell which httpd server process was going to handle the request.  I tried logging with jk_log, but the process was crashing before the log was flushed.  I added this:

     if (r) {
        int myHandle = open("/tmp/log.txt", O_WRONLY|O_APPEND|O_CREAT, 0666);
        char myBuf[1024];
        sprintf(myBuf, "%s:%d r: 0x%08x r->subprocess_env: 0x%08x\n", __FILE__, __LINE__, (int)r, (r ? (int)( r->subprocess_env) : 0));
        write(myHandle, myBuf, strlen(myBuf));
        if (r && r->subprocess_env) {
            sprintf(myBuf, "%s:%d r->subprocess_env: %s\n", __FILE__, __LINE__, ap_array_pstrcat(r->pool, (array_header*) r->subprocess_env, '='));
            write(myHandle, myBuf, strlen(myBuf));
        }
        close(myHandle);
    }

which wrote this for three requests: two over HTTP, the last over HTTPS.
mod_jk.c:663 r: 0x01856238 r->subprocess_env: 0x01856548
mod_jk.c:666 r->subprocess_env: SCRIPT_URL=/services/MyServlet
mod_jk.c:663 r: 0x01856238 r->subprocess_env: 0x01856548
mod_jk.c:666 r->subprocess_env: SCRIPT_URL=/services/MyServlet
mod_jk.c:663 r: 0x01856238 r->subprocess_env: 0x01856548
mod_jk.c:666 r->subprocess_env: SCRIPT_URL=/services/Style.css

Enabling the generation of core dumps for use with gdb is proving awkward.

Comment 3 Patrick Middleton 2008-12-05 09:16:19 UTC
so we have all the magic tested and working as per http://developer.apple.com/technotes/tn2004/tn2123.html for enabling core dumps, and even a CoreDumpDirectory directive in /etc/httpd/httpd.conf , but we get no core dumps.

One thing I have noticed is that httpd seems to take several seconds to crash; as if steadily iterating until it blows the stack up, which might explain why the the dead address looks like a bit of ASCII string.

I have also noted that my JkMount directives aren't working as expected.  Using older mod_jk versions, they worked with everything in the main httpd.conf ; with newer versions, some weren't working at all with HTTPS, so I moved most of the mod_jk configuration into the virtual host containers; and it's still not right.  Depending on the URL, some requests get 404 errors from the webserver, and some cause crashes.

mod_jk.c:663 r: 0x01807100 r->subprocess_env: 0x01807468
mod_jk.c:666 r->subprocess_env: REDIRECT_SCRIPT_URL=/services/=REDIRECT_SCRIPT_URI=http://localhost/services/=REDIRECT_STATUS

Comment 4 Rainer Jung 2008-12-05 09:39:35 UTC
(In reply to comment #3)
> so we have all the magic tested and working as per
> http://developer.apple.com/technotes/tn2004/tn2123.html for enabling core
> dumps, and even a CoreDumpDirectory directive in /etc/httpd/httpd.conf , but we
> get no core dumps.

I didn't read the document, but what could help: if you start as root (e.g. when using low ports) and switch user via User/Group in httpd.conf, use a high port instead and run directly as a normal user. Most OSes are reluctant to dump core for processes which were switching user. Another thing (maybe in the quoted doc?) is that limit/ulimit can prevent the writing of core dumps.

Concerning gdb: httpd -X starts only one httpd process, so you always know which you should attach to :) Of course that's not for production, because it can only handle one response at a time, but that's fine for testing.

> One thing I have noticed is that httpd seems to take several seconds to crash;
> as if steadily iterating until it blows the stack up, which might explain why
> the the dead address looks like a bit of ASCII string.

Possible, see below for the ap_table_get code, which indeed does an iteration.

> I have also noted that my JkMount directives aren't working as expected.  Using
> older mod_jk versions, they worked with everything in the main httpd.conf ;
> with newer versions, some weren't working at all with HTTPS, so I moved most of
> the mod_jk configuration into the virtual host containers; and it's still not
> right.  Depending on the URL, some requests get 404 errors from the webserver,
> and some cause crashes.

That should be an independent thing. In fact in 1.2.26 we cleaned up the vhost handling and as a result there was exactly one incompatible change, namely the relation between JkMount and vhosts. For testing, you can rule out that effect by e.g. putting them all globalyy and adding to the global server "JkMountCopy All". If you put them into the VHosts, you might need to add them to several VHosts (e.g. one for http, one for https).


Concerning your previous comment:

> which wrote this for three requests: two over HTTP, the last over HTTPS.
> mod_jk.c:663 r: 0x01856238 r->subprocess_env: 0x01856548
> mod_jk.c:666 r->subprocess_env: SCRIPT_URL=/services/MyServlet
> mod_jk.c:663 r: 0x01856238 r->subprocess_env: 0x01856548
> mod_jk.c:666 r->subprocess_env: SCRIPT_URL=/services/MyServlet
> mod_jk.c:663 r: 0x01856238 r->subprocess_env: 0x01856548
> mod_jk.c:666 r->subprocess_env: SCRIPT_URL=/services/Style.css

OK, so it's not NULL, and after the last output it crashes (because it's an
HTTP request), right?

Unfortunately I can't see any use of memcpy in httpd's ap_table_get:

API_EXPORT(const char *) ap_table_get(const table *t, const char *key)
{
    table_entry *elts = (table_entry *) t->a.elts;
    int i;

    if (key == NULL)
        return NULL;

    for (i = 0; i < t->a.nelts; ++i)
        if (!strcasecmp(elts[i].key, key))
            return elts[i].val;

    return NULL;
}

Maybe the strcasecmp doesn't terminate, because one of the two strings is not correctly 0-terminated?

Unfortunately I don't have access to MacOS X, so that will be hard to debug for
me. You could:

1) delete line 659 and see, whether it runs stable then

2) replace the call to ap_table_get in line 659 by a call to my_table_get and
copy the above ap_table_get under the name my_table_get into mod_jk.c. This
code can then be instrumented with more output statements like you already did,
to get an idea what's wrong and where exactly it crashes.


Comment 5 Patrick Middleton 2008-12-08 03:02:35 UTC
I tried the additional debugging tips (run a single server using -X, as a user, listening on nonprivileged ports) but still couldn't create a coredump.

I tried the hint about 'JkMountCopy All', and cleaned up my httpd.conf.  There are no longer mod_jk directives in virtual host containers.  httpd no longer crashes.

I'd been using mod_jk directives in virtual host containers because there has been a problem with static URIs /services.html and servlet-mapped URIs /services/ and /services/* where neither would work correctly.  Using mod_jk 1.2.27 with JkMountCopy All in the global server configuration, that problem is gone too.

For https://localhost/services/index.jsp:
mod_jk.c:666 r->subprocess_env: SCRIPT_URL=/services/index.jsp=SCRIPT_URI

For https://localhost/services/:
mod_jk.c:666 r->subprocess_env: REDIRECT_SCRIPT_URL=/services/=REDIRECT_SCRIPT_URI=https://localhost/services/=REDIRECT_HTTPS=on=REDIRECT_STATUS

At this point, I think there is a bug in how mod_jk works with virtual hosts, but which does not affect well-configured servers.  Since I'm making no progress debugging this and it's not crashing, do we want to close this bug report?
Comment 6 Rainer Jung 2008-12-08 03:20:17 UTC
Thanks for your feedback.

Do you have a test system, on which you can reproduce? In other words: are you able and willing to dig into the problem some more?

If yes, It would be very helpful to kopy the ap_table_get code from comment #4 into your mod_jk.c (under a different name, like my_table_get) and replace the call to ap_table_get in line 659 with my_table_get. Then finally add some output statements in my_table_get to check, whether it loops indefinitely, or whether it is actually the strcasecmp that crashes.

Unfortunately I can't reproduce, and I'm still not convinced, that it i a JK bug. I've seen similar crashes when other modules corupt memory. So I'm still interested in getting a clearer picture about what's wrong.
Comment 7 Patrick Middleton 2008-12-09 04:01:05 UTC
Created attachment 23004 [details]
sample of debugging code, output, and httpd.conf files

Having dug in some more ... 

I wondered if I ought to try blaming mod_gzip/1.3.26.1a, but removing it made no difference.

I didn't keep my original problematic httpd.conf file.  It took me some time to put some changes back in to show the problem.  When looking at the files I have included in the attachment (the httpd.conf files have been removed of most of the things my Corporate Security Officer would not want disclosed) you might notice a SetHandler jakarta-servlet directive somewhere in the bad one; I was getting random Access Forbidden and Resource Not Available responses for requests that should have been handled by my servlet.



My original stacktrace was generated on PPC, a RISC architecture where out-of-order execution is rampant, and showed references to things like memcpy() which didn't make sense for the reported line number of the source code.  Having added some debugging code, the point at which the bug is reported has moved; it is now more consistently this line:
'            s->extension.fail_on_status = ap_palloc(r->pool, e->fail_on_status_size * sizeof(int));'

and the debugging output I have included that for requests with the 'good' configuration, the rule_extension structure is sensible (the timeout is -1, every other field is 0 or null) but for requests with the 'bad' configuration, that structure contains random rubbish which causes the dysfunctional call to memcpy().

So ... why are the contents of 'rule_extension_t *e' bad?  Are we getting a pointer to something that's been overwritten, or being given a pointer to something that was never initialized?
Comment 8 Rainer Jung 2008-12-09 04:30:53 UTC
Aha, progress!
In jk_translate there are two lines, where we use

ap_set_module_config(r->request_config, &jk_module, ...)

Those are the lines were we set the rule_extension_t. So if you could output the value of the last argument (as a pointer, not the structure it points to) there and of e after

e = (rule_extension_t *)ap_get_module_config(r->request_config, &jk_module);

we can see, whether the request_config changes (e changes), or if not it's actually the contents of e that change.

You should also output the address of r in the three lines, so we can check whether actually the request changes between seeting the structure and retrieving it. That would be something we don't expect, but maybe there are sub requests or something similar involved.

I'll try to reproduce later today, now that we are closer to the root cause.


Comment 9 Patrick Middleton 2008-12-09 09:33:20 UTC
What I found was that in jk_translate, in the 'good' configuration, e (or, *ext ... the rule_extension_t) remained null after the first call to ap_set_module_config().  There is a second call to ap_set_module_config().  My logging shows that the rule_extension_t has been set before this, and also before the call to map_uri_to_worker_ext() a few lines earlier.

In the 'bad' configuration, a crash was occurring before getting as far as three logging calls I added around map_uri_to_worker_ext() and ap_set_module_config().

I noticed that in the 'bad' configuration, there were no JkMount directives at all, so I added some.  This did not prevent crashing.  Later, the 'good' configuration was crashing too.  Right now neither are crashing.  Intermittent problems of /services/ , /services/index.jsp and /services.html producing server 404 pages or Tomcat resource-unavailable pages continues.  Using <Location> to add a 'SetHandler jakarta-servlet' directive continues to move in and out of the configuration.  I am guessing wildly here, but from the symptoms it looks to me like some memory is being allocated for a control structure like a rule_extension_t, and then not initialised by code because it appears to have been initialised already.
Comment 10 Rainer Jung 2008-12-17 16:33:45 UTC
Could you please try the patch available at:

http://people.apache.org/~rjung/mod_jk-dev/patches/extension_crash.patch

It came up for a related issue reported today and it could well fix your problem.
Comment 11 Patrick Middleton 2008-12-18 02:42:02 UTC
Patch applied, works for me -- but then, apart from the first day I found this crasher, I have never been able to reproduce it reliably.

For the benefit of anybody else looking at the summary of this bug in the future, yes it's wrong, it should be something like "random crashes and failure to handle some URLs".

We have a small number of development systems that are still running MacOSX 10.1.5 with Apache 1.3.26.  I've built the patched mod_jk 1.2.27 for those.  It works there with essentially the same httpd.conf -- except I need to use 'SetHandler jakarta-servlet', JkMount and JkMountCopy do not work as expected.  This is a problem I have found intermittently on MacOSX 10.4.11 with Apache 1.3.41 and could be an unrelated bug.
Comment 12 Rainer Jung 2008-12-18 02:57:03 UTC
Patch commited as r727693.
Will be part of 1.2.28.
Thanks for helping debugging it.