Created attachment 23071 [details] GNU-Debugger-Output.txt DESCRIPTION: The Apache-childs begin to segfault once Apache reaches 130 childs. This can be reproduced every time by setting MinSpareServers to a value higher than 130. It has been tested with 2.2.11, 2.2.10 and 2.2.9 which all have the same behaviour. 2.0.63 is working fine. The error_log contains: ... [Sat Jan 03 01:32:34 2009] [notice] child pid 672 exit signal Segmentation fault (11) [Sat Jan 03 01:32:34 2009] [notice] child pid 673 exit signal Segmentation fault (11) [Sat Jan 03 01:32:34 2009] [notice] child pid 674 exit signal Segmentation fault (11) ... Attached is the output of the gdb of one coredump. CONFIGURATION: ./configure --prefix=/usr/local/bin/httpd --enable-suexec --with-suexec --enable-mods-shared=all --disable-imagemap Following lines have been added to httpd.conf: # --------------------------------------------- <IfModule prefork.c> StartServers 32 MinSpareServers 200 MaxSpareServers 400 ServerLimit 1600 MaxClients 1600 MaxRequestsPerChild 4000 </IfModule> CoreDumpDirectory /tmp # ---------------------------------------------
Please provide the following information: - Kernel version - glibc version - Your httpd.conf - Your ulimits (ulimit -a) Please execute the following steps with gdb when you have loaded your core dump: frame 1 p num_listensocks
Created attachment 23075 [details] httpd.conf
- Kernel version 2.6.27.9-159 (x86_64) - glibc version 2.9 - Your httpd.conf Default httpd.conf installed during installation with the lines above added. Attached is my httpd.conf - Your ulimits (ulimit -a) I have already tried set the limits as high as possible, the current settings are: core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) unlimited file size (blocks, -f) unlimited pending signals (-i) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1000000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) unlimited real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited - gdb steps (gdb) frame 1 #1 0x00000000004494b6 in child_main (child_num_arg=<value optimized out>) at prefork.c:532 532 (void) apr_pollset_add(pollset, &pfd); (gdb) p num_listensocks $1 = 1
An additional note which may be important. I have just tested it with the first 2.2-release of Apache 2.2.0 and got the same behaviour.
Ruediger, is apr_pollset_create() failing due to some kernel limit? Alex, you could confirm that by running with this tiny patch Index: server/mpm/prefork/prefork.c =================================================================== --- server/mpm/prefork/prefork.c (revision 731724) +++ server/mpm/prefork/prefork.c (working copy) @@ -485,7 +485,12 @@ /* Set up the pollfd array */ /* ### check the status */ - (void) apr_pollset_create(&pollset, num_listensocks, pchild, 0); + status = apr_pollset_create(&pollset, num_listensocks, pchild, 0); + if (status != APR_SUCCESS) { + ap_log_error(APLOG_MARK, APLOG_EMERG, status, ap_server_conf, + "Couldn't initialize pollset in child"); + clean_child_exit(APEXIT_CHILDFATAL); + } and seeing if you get the new message. If you do, you might be able to work around whatever is causing the apr_pollset_create() failure by setting "apr_cv_epoll=no" when you configure. (none of this tested ;) )
(In reply to comment #5) > Ruediger, is apr_pollset_create() failing due to some kernel limit? I don't know. I guess your patch will be very helpful in detecting why the creation of the pollset fails. IMHO the main difference between 2.0.63 and 2.2.x is that epoll is used and I had the idea that we might be out of fd's, but this does not seem to be the case.
The epoll_create failure looks like some kind of weird kernel bug; strace output looks like the below: read(9, "\2\0\0\0\1\0\0\0\0\0\0\0"..., 12) = 12 read(9, ""..., 0) = 0 close(9) = 0 setgroups(1, [48]) = 0 geteuid() = 0 setuid(48) = 0 epoll_create(1) = -1 EMFILE (Too many open files) --- SIGSEGV (Segmentation fault) @ 0 (0) --- the fact that an fd was closed right before epoll_create() seems like sufficient evidence for this, and it's not some global fd limit being hit since that would produce ENFILE.
Bleh, this is annoying :( It's a new tunable setting. If you do echo 1024 > /proc/sys/fs/epoll/max_user_instances where 1024 is some value larger than MaxClients, then it works. http://lkml.indiana.edu/hypermail/linux/kernel/0812.0/01183.html
So I guess this transforms into a documentation bug and we should document it somewhere. Maybe add a platform specific section for Linux, like the one for Windows (http://httpd.apache.org/docs/2.2/en/platform/windows.html)?
At least the segfault is now fixed in trunk (r732414) by Jeff's patch. Instead an error message is logged and the child exits.
Patch proposed for backport to 2.2.x as r732465.
I can confirm that the patch is working in 2.2.11 and that raising max_user_instances solves the problem. Thanks for your efforts! This may affect the worker-MPM as well if you have more than 128 workers which is probably very unlikely. A specific documentation-section for Linux would be a good idea.
I'm going to see if we can either get the default setting raised or whether this can be turned into an rlimit which we can bump manually.
*** Bug 46501 has been marked as a duplicate of this bug. ***
Created attachment 23105 [details] add more error logging for apr_pollset_create and apr_pollset_add CONNECT in mod_proxy and mod_cgi also use apr_pollset_create. Therefore it is possible that this problem also occurs in worker or event mpm. There is now also the new max_user_watches limit that may affect apr_pollset_add (though the default seems high enough). Here is a patch that does some more error checking for these calls.
Fixed in r733698.
Please look at the patch I submitted. At least mpm_worker and mod_cgi need to be changed, too.
FWIW, there's a thread on the kernel list about this and so far as I can tell the decision was to remove the default limits again: http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/01806.html
Hi Everybody, seems like they've removed this setting in Kernel 2.6.28.4 which I've installed today: server:# ls /proc/sys/fs/epoll/ max_user_watches So, max_user_instances is gone and so is the problem :-) cheers, Werner Detter
*** Bug 46856 has been marked as a duplicate of this bug. ***
*** Bug 47519 has been marked as a duplicate of this bug. ***
(In reply to comment #16) > Fixed in r733698. Nick, why worker and beos wasn't fixed, too? Also apr_pollset_create() can fail if apr build on system that supports epoll_create1() (with recent glibc/linux kernel for example) but run on a system which doesn't support it (older linux kernel)
Created attachment 24140 [details] worker beos (not tested)
Committed Stefans patch as r804764 to trunk. It contains even more checks. BEOS support isn't present on trunk any longer hence no changes there.
Hope to see it backported to 2.2.x.
(In reply to comment #25) > Hope to see it backported to 2.2.x. Yes, me too!