Bug 34514 - On a busy server with MPM Worker apache 2.0.54 crash permanently
Summary: On a busy server with MPM Worker apache 2.0.54 crash permanently
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mpm_worker (show other bugs)
Version: 2.0.54
Hardware: Other Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
: 34980 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-04-19 09:37 UTC by hipodilski
Modified: 2005-10-28 09:46 UTC (History)
3 users (show)



Attachments
treat thread creation as non-fatal (1.08 KB, patch)
2005-04-28 17:11 UTC, Joe Orton
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description hipodilski 2005-04-19 09:37:14 UTC
Hi there. I've recently updated to apache 2.0.54, compiled it and started it
after a period of time the apache did dies with those messages in the error_log.
[Mon Apr 18 18:46:47 2005] [notice] Apache configured -- resuming normal operations
[Tue Apr 19 03:19:18 2005] [error] server reached MaxClients setting, consider
raising the MaxClients setting
[Tue Apr 19 03:19:35 2005] [alert] (11)Resource temporarily unavailable:
apr_thread_create: unable to create listener thread
[Tue Apr 19 03:19:35 2005] [alert] (11)Resource temporarily unavailable:
apr_thread_create: unable to create worker thread
[Tue Apr 19 03:19:35 2005] [alert] (11)Resource temporarily unavailable:
apr_thread_create: unable to create worker thread
[Tue Apr 19 03:19:35 2005] [alert] (11)Resource temporarily unavailable:
apr_thread_create: unable to create worker thread
[Tue Apr 19 03:19:35 2005] [alert] (11)Resource temporarily unavailable:
apr_thread_create: unable to create worker thread
[Tue Apr 19 03:19:35 2005] [alert] (11)Resource temporarily unavailable:
apr_thread_create: unable to create worker thread
[Tue Apr 19 03:19:35 2005] [alert] (11)Resource temporarily unavailable:
apr_thread_create: unable to create listener thread
[Tue Apr 19 03:19:50 2005] [alert] Child 16015 returned a Fatal error... Apache
is exiting!
I've no problems on the same servers with apache 2.0.53. So obviously the
problem is new. here is my configure options I use with the apache.

#! /bin/sh
#
# Created by configure

"./configure" \
"--disable-autoindex" \
"--libdir=/usr/local/lib" \
"--disable-negotiation" \
"--with-mpm=worker" \
"--disable-userdir" \
"--disable-ssl" \
"--enable-include" \
"--enable-auth" \
"--enable-auth-digest" \
"--enable-vhost-alias" \
"--enable-access" \
"--enable-asis" \
"--enable-imap" \
"--enable-actions" \
"--disable-autoindex" \
"--enable-env" \
"--enable-setenvif" \
"--enable-mime" \
"--disable-charset-lite" \
"--enable-auth-digest" \
"--enable-log-config" \
"--enable-deflate" \
"--enable-http" \
"--enable-rewrite" \
"--enable-so" \
"--enable-dir" \
"--enable-nonportable-atomics=yes" \
"--sysconfdir=/usr/local/apache2/conf" \
"--sbindir=/usr/local/apache2/sbin" \
"--bindir=/usr/local/apache2/bin" \
"--enable-status" \
"$@"
It happens two times with the same server for 1 day. Oh and yes the server is
running fedora core 1. Another problem i did observed with apache was. That
apache was linked to libiconv.so after compiling and it did not find the
libabrary ( the library is in /usr/local/lib ). When i specify the
--libdir=/usr/local/lib the problem goes away, but i guess this should not be
the primary behaviour. The strange thing about the second problem was that it
did not produce error messages when "configuring and making" and after
compilation it was linked to "a missing" library.

Hope the problems would be fixed soon.

Tons of regards, 
hip0
-=-=
Comment 1 Joe Orton 2005-04-19 14:57:36 UTC
What configuration are you using for worker? (settings of MaxClients etc)

w.r.t. the second problem you should just avoid having installing libiconv on a
modern Linux system, the libc includes an iconv implementation already.
Comment 2 hipodilski 2005-04-19 16:01:51 UTC
Here are the MaxClients etc. settings.
Timeout 10
KeepAlive On
MaxKeepAliveRequests 10
KeepAliveTimeout 5
MaxClients      300
StartServers     5
MinSpareServers 10
MaxSpareServers 30
StartServers    2
MinSpareThreads 50
MaxSpareThreads 150
ThreadsPerChild 25
MaxRequestsPerChild  10000
ServerLimit 400
UseCanonicalName Off


If you need something else just tell me. Hope that would help.
Comment 3 hipodilski 2005-04-21 09:07:12 UTC
I reversed back to the old apache 2.0.53. And the same thing happened again. The
only thing i've changed to the server recently was the ulimits for all users so
probably the bug is concerning it. All users had soft limit of process spawn set
to 300 and hard limit of process spawn set to 400. here is my output of ulimit -a;
[root@fresh-ringtones logs]# ulimit -a
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) 10240
cpu time             (seconds, -t) unlimited
max user processes            (-u) 300
virtual memory        (kbytes, -v) unlimited
[root@fresh-ringtones logs]# logout

Anyway i think apache should not exit in a condition but rather, sleep for a
period of time and then try to reinitialize it's childs or something. You
developers should know best.
Comment 4 Kelly Price 2005-04-27 18:52:16 UTC
I have the same thing in a Debian SARGE install of Linux.  Relivant logs:

[Wed Apr 27 12:17:15 2005] [notice] Apache/2.0.52 (Unix) mod_ssl/2.0.52 OpenSSL/
0.9.7e DAV/2 mod_python/3.1.3 Python/2.3.5 configured -- resuming normal operati
ons                                                                             
[Wed Apr 27 12:17:17 2005] [alert] (12)Cannot allocate memory: apr_thread_create
: unable to create worker thread                                                
[Wed Apr 27 12:17:28 2005] [alert] Child 32004 returned a Fatal error...\nApache
 is exiting!                                                                    

I'm also letting folks with the exchange4linux program is, because this
version's bundled with that.
Comment 5 hipodilski 2005-04-28 10:14:57 UTC
Hi there. The problem is probably the ulimit limitations on your server.
check out ulimit -a; and your limits.conf file. If that's the problem try
raising the count for maximum processes. I tried that and ended completely
removing the process limits as it was in the beginning. The apache now works
with no problem. That's probably a bug apache developers need to take a look
when they have a little extra time, because the problem is workaroundable.
Comment 6 Joe Orton 2005-04-28 17:06:19 UTC
You can actually distinguish the two conditions by the error message:

- kernel thread limit => "Cannot allocate memory" error
- configured ulimit => "Resource temporarily unavailable" error
Comment 7 Joe Orton 2005-04-28 17:11:42 UTC
Created attachment 14867 [details]
treat thread creation as non-fatal

I do wonder why this is a fatal error, and I'm not the first to do so.	You
could try the above patch which treats this as a transient problem and will not
force the server to exit.

Delaying the exit of the child for 10 seconds *without* destroying all
previously created threads for this child seems pretty unfriendly, it must be
said.
Comment 8 Jeff Trawick 2005-04-28 17:24:00 UTC
Here's a similar patch I've given people in the past:

http://people.apache.org/~trawick/worker_mpm_pthread_create_resume.patch

One way that it differs is that it keeps the delay, but in the parent.

What bothers me about these patches is that for 9 users out of ten that see this
(in my experience), the problem is a showstopper configuration error that
prevents even the first child process from starting up properly.  In that case,
it is best for the web server to exit.  A better patch would still exit if the
children created at startup couldn't initialize, but keep running if transient
errors were hit later.
Comment 9 Greg Ames 2005-05-03 20:51:51 UTC
Here is yet another patch:

http://people.apache.org/~gregames/thread_create_recovery.patch

design:

1. exit with APEXIT_CHILDSICK for thread create failures (same as other patches)
2. add logic to the parent to decide how bad these errors really are.  if we
can't initialize a single worker process, just give up.  otherwise treat these
as transient errors.
3. nuke the 10 sec delays.  it's better to let the parent know what's happening
right away. if the parent creates a fork-a-thon, the parent is broken.
  
Comment 10 Greg Ames 2005-05-04 22:05:21 UTC
fix committed to 2.1 trunk, revision 168182
Comment 11 Shane Brath 2005-05-06 04:32:51 UTC
I have applied Greg's patch, and I still had the same problem. Here is a log 
from my server. I can repeate this almost daily, because I have a surge of 
traffic in the mornings. Sugestions: 


[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:05 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:09 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:09 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:10 2005] [error] (12)Cannot allocate memory: fork: Unable to 
fork new process
[Thu May 05 06:35:10 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:10 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:10 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:20 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Thu May 05 06:35:21 2005] [alert] Child 19934 returned a Fatal error... 
Apache is exiting!
[Thu May 05 06:35:24 2005] [warn] child process 18126 still did not exit, 
sending a SIGTERM
[Thu May 05 06:35:24 2005] [warn] child process 18173 still did not exit, 
sending a SIGTERM
[ many many lines for other processes were deleted ]
[Thu May 05 06:35:30 2005] [error] child process 18126 still did not exit, 
sending a SIGKILL
[Thu May 05 06:35:30 2005] [error] child process 18173 still did not exit, 
sending a SIGKILL
[ deleted many like repeated lines ]
[Thu May 05 06:56:30 2005] [notice] Digest: generating secret for digest 
authentication ...
[Thu May 05 06:56:30 2005] [notice] Digest: done
[Thu May 05 06:56:31 2005] [warn] pid file /usr/local/apache2/run/httpd.pid 
overwritten -- Unclean shutdown of previous Apache run?
[Thu May 05 06:56:31 2005] [notice] Apache/2.0.54 (Unix) configured -- 
resuming normal operations
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

# config.log file data:

$ ./configure --enable-so --with-mpm=worker --enable-expires --enable-headers -
-enable-modules=all

## --------- ##
## Platform. ##
## --------- ##

hostname = xxxxx
uname -m = i686
uname -r = 2.4.21-27.ELsmp
uname -s = Linux
uname -v = #1 SMP Wed Dec 1 21:59:02 EST 2004

# httpd.conf settings:

ServerLimit 128
StartServers 2
MaxClients 2050
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 50
MaxRequestsPerChild 0
Comment 12 Jeff Trawick 2005-05-06 13:36:42 UTC
I just took a look at the latest worker.c with Greg's patch.  There are three
places where threads are created in worker MPM, and Greg neglected to update one
of the paths.

Here is the code which is still unpatched:

    rv = apr_thread_create(&start_thread_id, thread_attr, start_threads,
                           ts, pchild);
    if (rv != APR_SUCCESS) {
        ap_log_error(APLOG_MARK, APLOG_ALERT, rv, ap_server_conf,
                     "apr_thread_create: unable to create worker thread");
        /* In case system resources are maxxed out, we don't want
           Apache running away with the CPU trying to fork over and
           over and over again if we exit. */
        apr_sleep(apr_time_from_sec(10));
        clean_child_exit(APEXIT_CHILDFATAL);
    }

    mpm_state = AP_MPMQ_RUNNING;

Delete the line with apr_sleep() and change APEXIT_CHILDFATAL to APEXIT_CHILDSICK.
Comment 13 Greg Ames 2005-05-09 16:48:45 UTC
oops.  Jeff is correct.  I missed one in worker and the same one in event.  I
committed the change to httpd-2.1 trunk as revision 168649.
Comment 14 Shane Brath 2005-05-09 19:41:30 UTC
After the fix from Jeff, I no longer crash, all I get now are warnings to the 
error_log: Mostly the thread error. But the server does not go down.


[Mon May 09 06:35:04 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Mon May 09 09:35:06 2005] [error] (12)Cannot allocate memory: fork: Unable to 
fork new process
Comment 15 Jeff Trawick 2005-05-20 12:24:17 UTC
*** Bug 34980 has been marked as a duplicate of this bug. ***
Comment 16 inoue 2005-05-23 14:35:24 UTC
I'm not sure this problem was resolved and my experience would be helpful.


My experience:
I had the same errors as follows,
[Tue May 10 16:18:20 2005] [alert] (12)Cannot allocate memory:
apr_thread_create: unable to create worker thread

$ cat /etc/issue
Red Hat Enterprise Linux AS release 3 (Taroon)

I did 'export LD_ASSUME_KERNEL=2.4.1', and the errors have disappeared.
# Before I reached to this resolution, I have checked ulimit and /proc/sys/ etc.
and I found nothing wrong.
Comment 17 Andreas Oesterer 2005-05-31 21:28:50 UTC
I just patched the worker.c file, but apache still goes down. I'm not sure if 
it's the same issue or whether this is a new problem.

[Tue May 31 12:06:57 2005] [error] (12)Cannot allocate memory: fork: Unable to 
fork new process
[Tue May 31 12:06:57 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Tue May 31 12:07:00 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Tue May 31 12:07:08 2005] [alert] Child 9900 returned a Fatal error... Apache 
is exiting!
[Tue May 31 12:07:12 2005] [warn] child process 7822 still did not exit, 
sending a SIGTERM
[Tue May 31 12:07:12 2005] [warn] child process 7825 still did not exit, 
sending a SIGTERM
[Tue May 31 12:07:12 2005] [warn] child process 7893 still did not exit, 
sending a SIGTERM
Comment 18 Jeff Trawick 2005-05-31 21:34:10 UTC
Make sure you got the original patch as well as the follow-up change which I
mentioned.  The original patch was not complete.  From your error log snippet,
I'd expect the complete fix to the problem discussed here is sufficient to keep
Apache running.

This fix will be in the next 2.0.x release.
Comment 19 Andreas Oesterer 2005-05-31 22:12:57 UTC
I updated lines 882, 963 and 1160 in worker.c to use APEXIT_CHILDSICK instead 
of APEXIT_CHILDFATAL. I also commented out the calls to wait 10 seconds 
[apr_sleep(apr_time_from_sec(10));], as the comment by Jeff suggested. But I 
was not sure if this applies to all 3 locations or just the third one that was 
initially missed.

However apache still died twice after I patched it.
Comment 20 Jeff Trawick 2005-06-01 13:17:24 UTC
Andreas, I'm stumped.  The error log messages you show indicate that the child
only encountered a pthread_create failure

[Tue May 31 12:06:57 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread
[Tue May 31 12:07:00 2005] [alert] (12)Cannot allocate memory: 
apr_thread_create: unable to create worker thread

but some child returned APEXIT_CHILDFATAL to the parent:

[Tue May 31 12:07:08 2005] [alert] Child 9900 returned a Fatal error... Apache 
is exiting!

There are only the three locations you mentioned for the apr_thread_create
failure in worker.c.  I can only suggest that you confirm that you ran "make"
and "make install" and restarted Apache after making the source code change ;)

Comment 21 Joe Orton 2005-09-07 16:24:24 UTC
*** Bug 34980 has been marked as a duplicate of this bug. ***
Comment 22 Joe Orton 2005-10-28 17:46:23 UTC
All the transient failures should now be non-fatal as of 2.0.55, so if you still
see problems with 2.0.55 please reopen and include the error_log entries on failure.

Thanks for the report.