Bug 19849 - If the mod_cgid daemon process dies it is never restarted
Summary: If the mod_cgid daemon process dies it is never restarted
Status: CLOSED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_cgi (show other bugs)
Version: 2.0.45
Hardware: Sun other
: P3 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
: 22483 23533 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-05-12 13:33 UTC by Glenn Nielsen
Modified: 2004-11-16 19:05 UTC (History)
2 users (show)



Attachments
patch to restart cgid daemon if it dies (6.23 KB, patch)
2003-05-29 02:11 UTC, Glenn Nielsen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Glenn Nielsen 2003-05-12 13:33:14 UTC
System, Sun Netra T1, Solaris 7, Apache 2.0.45 with worker MPM.

When stress testing a binary cgi the mod_cgid daemon would die.
After it had died Apache would detect and log that it had died with
this message in the error_log,
"No such process: cgid daemon is gone; is Apache terminating?".

It does not attempt to restart the cgi daemon process again so all
subsequent CGI requests fail.

Apache 2 with the worker MPM for CGI would be more robust if mod_cgid
could restart its child daemon process if it dies.
Comment 1 Glenn Nielsen 2003-05-28 16:21:56 UTC
I have done some more research.

When the cgid_daemon dies I see the following error message, the pid is for
the cgid_daemon:

[Wed May 28 11:05:59 2003] [notice] child pid 24470 exit signal Broken pipe (13)

I added some debug logging to the cgid_maint function in mod_cgid.c.

When the cgid_daemon dies with a broken pipe cgid_maint logged the following:

[Wed May 28 11:05:59 2003] [error] cgid_maint
[Wed May 28 11:05:59 2003] [error] cgid_maint APR_OC_REASON_DEATH
[Wed May 28 11:05:59 2003] [error] cgid_maint
[Wed May 28 11:05:59 2003] [error] cgid_maint APR_OC_REASON_UNREGISTER
[Wed May 28 11:05:59 2003] [error] cgid_maint DONE
[Wed May 28 11:05:59 2003] [error] cgid_maint DONE

The code for doing a graceful restart is triggered when APR_OC_REASON_LOST is
the reason, but this is never received by cgid_maint.

Here are all the log entries when I reproduced this:
[Wed May 28 11:05:12 2003] [notice] Apache/2.0.45 (Unix) mod_ssl/2.0.45
OpenSSL/0.9.7a mod_jk/1.2.3 configured -- resuming normal operations
[Wed May 28 11:05:12 2003] [info] Server built: May 15 2003 21:22:19
[Wed May 28 11:05:57 2003] [error] [client 207.160.133.9] Premature end of
script headers: counterdate.cgi
[Wed May 28 11:05:57 2003] [error] [client 207.160.133.9] unable to include
"/cgi/counterdate.cgi" in parsed file /export/home/moxp/www/www/index.html
[Wed May 28 11:05:57 2003] [info] (32)Broken pipe: core_output_filter: writing
data to the network
[Wed May 28 11:05:58 2003] [info] (32)Broken pipe: core_output_filter: writing
data to the network
[Wed May 28 11:05:58 2003] [info] (32)Broken pipe: core_output_filter: writing
data to the network
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] Premature end of
script headers: counter.cgi
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] unable to include
"/cgi/counter.cgi" in parsed file /export/home/moxp/www/www/index.html
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] Premature end of
script headers: counterdate.cgi
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] unable to include
"/cgi/counterdate.cgi" in parsed file /export/home/moxp/www/www/index.html
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] Premature end of
script headers: counterdate.cgi
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] unable to include
"/cgi/counterdate.cgi" in parsed file /export/home/moxp/www/www/index.html
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] Premature end of
script headers: counterdate.cgi
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] unable to include
"/cgi/counterdate.cgi" in parsed file /export/home/moxp/www/www/index.html
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] Premature end of
script headers: counterdate.cgi
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] unable to include
"/cgi/counterdate.cgi" in parsed file /export/home/moxp/www/www/index.html
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] Premature end of
script headers: counter.cgi
[Wed May 28 11:05:58 2003] [error] [client 207.160.133.9] unable to include
"/cgi/counter.cgi" in parsed file /export/home/moxp/www/www/index.html
[Wed May 28 11:05:59 2003] [notice] child pid 24470 exit signal Broken pipe (13)
[Wed May 28 11:05:59 2003] [error] cgid_maint
[Wed May 28 11:05:59 2003] [error] cgid_maint APR_OC_REASON_DEATH
[Wed May 28 11:05:59 2003] [error] cgid_maint
[Wed May 28 11:05:59 2003] [error] cgid_maint APR_OC_REASON_UNREGISTER
[Wed May 28 11:05:59 2003] [error] cgid_maint DONE
[Wed May 28 11:05:59 2003] [error] cgid_maint DONE
[Wed May 28 11:05:59 2003] [error] [client 207.160.133.9] (3)No such process:
cgid daemon is gone; is Apache terminating?: /export/home/moxp/www/cgi/counter.cgi
Comment 2 Glenn Nielsen 2003-05-29 02:09:50 UTC
Attached is a patch which will restart the cgid daemon if it dies.
At first I tried doing a server restart like the cgid_maint code had
originally been setup to do, ( kill(getpid(), AP_SIG_GRACEFUL); ) but after 
an apache restart from cgid_maint I saw a number of the following warning
messages. cgid worked fine though.

[Wed May 28 10:49:42 2003] [warn] long lost child came home! (pid 21018)
[Wed May 28 10:49:42 2003] [warn] long lost child came home! (pid 21019)

So I wrote a patch that would just restart the cgid daemon rather than
restart apache itself.  This patch seems to work fine.
Comment 3 Glenn Nielsen 2003-05-29 02:11:26 UTC
Created attachment 6553 [details]
patch to restart cgid daemon if it dies
Comment 4 Glenn Nielsen 2003-05-29 02:15:50 UTC
A final note.  We started upgrading some of our production Sun Solaris servers
to Apache 2 seven weeks ago  Two of those servers have had the cgid daemon
die at once during the seven week period.  This resulted in cgi's failing until
our customers notified us of the problem.  We then had to restart or stop/start
apache.  The patch I sumbitted will automatically restart the cgid daemon so
that only a few cgi requests fail rather coninuous failure until restart or
stop/start of apache.
Comment 5 Jeff Trawick 2003-06-23 02:41:36 UTC
Thanks for your patch, and hopefully it will get reviewed/committed soon.

I wanted to mention that if you're running into any problems with cgid, you need
to apply this patch:

http://cvs.apache.org/viewcvs.cgi/httpd-2.0/modules/generators/mod_cgid.c.diff?r1=1.150&r2=1.151

That fixes a simple bug with horrendous consequences, including the murder of
the cgid daemon process (with the sigpipe in the library).
Comment 6 Jeff Trawick 2003-06-23 20:20:52 UTC
patch committed, thanks!!!!!

I'll propose it for merging into the stable branch (2.0.47-dev).
Comment 7 Jeff Trawick 2003-08-22 11:47:27 UTC
*** Bug 22483 has been marked as a duplicate of this bug. ***
Comment 8 Jeff Trawick 2003-12-13 22:28:06 UTC
*** Bug 23533 has been marked as a duplicate of this bug. ***