Hi With kernel 3.2.9 (included) to 3.2.17 (excluded) there was an arbitrary limitation on epoll path (1000) which cause apache to deadlock when having 1001+ process. The first patch is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=28d82dc1c4edbc352129f97f4ca22624d1fe61de, which put the limit to 1000, and the second patch is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=93dc6107a76daed81c07f50215fa6ae77691634f, which doesn't limit epoll for non-nested path (so apache work again). This limitation show a bug in apache which lead to a deadlock: if a httpd process get an error when doing epoll_ctl, it continue to run, and if he get the accept_mutex, epoll_wait will return 0 because epoll_ctl just failed, and apache will be blocked. Here follow a small strace of the 1001 process: -epoll_create1(O_CLOEXEC) = 39 -epoll_ctl(39, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=1010443880, u64=140193037952616}}) = -1 EINVAL (Invalid argument) -epoll_ctl(39, EPOLL_CTL_ADD, 4, {EPOLLIN, {u32=1010443880, u64=140193037952616}}) = -1 EINVAL (Invalid argument) -semop(14385470, {{0, -1, SEM_UNDO}}, 1 <unfinished ...> <... semop resumed> ) = 0 -epoll_wait(39, <unfinished ...> <... epoll_wait resumed> {}, 2, 10000) = 0 To reproduce: -get a kernel with the limitation (3.2.9 to 3.2.16 for the 3.2 branch) -configure httpd to listen on at least 2 ports (80 and 81) so that it use accept_mutex -configure httpd to "StartServers 1001" -start it with strace -f /etc/init.d/httpd start > ~/debug.log -make a lot of request until it stop responding The httpd process that fail to epoll_ctl should kill it self or retry epoll_ctl. This bug was uncovered on a centos 6.3 with httpd 2.2.15 and a 3.2.13 kernel, but i've read other thread speaking of the 1000 httpd process limit on ubuntu... https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/1028470 (so still present in 2.2.22 for sure) I've put normal severity because by updating the kernel apache work again.
In the latest Apache 2.2.x code,the child_main() function in prefork.c is not checking the status code after calling apr_pollset_add(). Here is an excerpt: for (lr = ap_listeners, i = num_listensocks; i--; lr = lr->next) { apr_pollfd_t pfd = { 0 }; pfd.desc_type = APR_POLL_SOCKET; pfd.desc.s = lr->sd; pfd.reqevents = APR_POLLIN; pfd.client_data = lr; /* ### check the status */ (void) apr_pollset_add(pollset, &pfd); } This code has been improved in Apache 2.4.x. svn blame shows the following revisions: 101799 gstein for (lr = ap_listeners, i = num_listensocks; i--; lr = lr->next) { 101799 gstein apr_pollfd_t pfd = { 0 }; 101799 gstein 101799 gstein pfd.desc_type = APR_POLL_SOCKET; 101799 gstein pfd.desc.s = lr->sd; 101799 gstein pfd.reqevents = APR_POLLIN; 101799 gstein pfd.client_data = lr; 101799 gstein 804764 rpluem status = apr_pollset_add(pollset, &pfd); 804764 rpluem if (status != APR_SUCCESS) { 1393382 jorton /* If the child processed a SIGWINCH before setting up the 1393382 jorton * pollset, this error path is expected and harmless, 1393382 jorton * since the listener fd was already closed; so don't 1393382 jorton * pollute the logs in that case. */ 1393382 jorton if (!die_now) { 1393382 jorton ap_log_error(APLOG_MARK, APLOG_EMERG, status, ap_server_conf, APLOGNO(00157) 1393382 jorton "Couldn't add listener to pollset; check system or user limits"); 1393382 jorton clean_child_exit(APEXIT_CHILDSICK); 1393382 jorton } 1393382 jorton clean_child_exit(0); 804764 rpluem } 757853 trawick 757853 trawick lr->accept_func = ap_unixd_accept; 96102 rbb }
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd. As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd. If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question. If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with. Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.