26467 – 'graceful' kills 'rotatelogs', then hangs long-lived httpd children

Bug 26467 - 'graceful' kills 'rotatelogs', then hangs long-lived httpd children

Summary: 'graceful' kills 'rotatelogs', then hangs long-lived httpd children

Status:	RESOLVED FIXED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	Core (show other bugs)
Version:	2.0.55
Hardware:	PC Windows 2000

Importance:	P3 normal with 3 votes (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:

Duplicates (1):	40041 (view as bug list)
Depends on:
Blocks:

Reported:	2004-01-27 17:02 UTC by Ben Collins-Sussman
Modified:	2006-07-14 01:35 UTC (History)
CC List:	5 users (show)

Attachments
Kill hung child httpd procs (897 bytes, text/plain) 2005-05-13 17:34 UTC, Will Yardley	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ben Collins-Sussman 2004-01-27 17:02:26 UTC

A graceful restart SIGTERMs the 'rotatelogs' child, but a long-lived httpd
process may still be serving a connection via KeepAlive.  The httpd child
continues to write logdata into the pipe, but there's no process reading from
the pipe anymore.  Eventually the pipe fills up, and the httpd child just hangs.

This bug was discovered when doing a large Subversion commit with apache
configured to use piped logging to 'rotatelogs'.  It's a pretty common setup,
and it's likely to burn future Apache (and Subversion) users.

Joe Orton says, "To fix this properly, I suppose piped loggers should not get
SIGTERMedduring a graceful restart, they should read till EOF then exit(0): then
when the last child attached to the piped logger for a particular generation
quits, the pipe is closed and the piped logger terminates gracefully too,
without losing log messages."

Here's the complete mail thread:

   http://www.mail-archive.com/dev%40httpd.apache.org/msg19247.html

Comment 1 Kevin Stange 2004-11-25 05:16:26 UTC

I've been pouring over Google search results trying to figure out the cause of
an extremely similar bug which seems to affect the 1.3.x branch of the Apache
HTTP server also.  Has there been any movement on this issue.  Is it likely
1.3.x would also have this problem?

Comment 2 Joe Orton 2004-11-25 10:54:45 UTC

Yes, this bug also affects 1.3.

Comment 3 Will Yardley 2005-05-13 17:34:40 UTC

Created attachment 15021 [details]
Kill hung child httpd procs

This little script (requires Proc::ProcessTable) is a hack to find child
processes other than the root one that have been around since before the last
rotatelogs process. Just in case it's useful to anyone else... can run from
cron.

Comment 4 Will Yardley 2005-05-13 17:39:56 UTC

> processes other than the root one that have been around since before the last
> rotatelogs process.

Sorry... s/last/first/

Comment 5 Jeff Trawick 2005-05-13 18:46:03 UTC

FWIW, I'm playing with a patch to prevent the hang.  Hopefully I can commit to
2.1 this weekend.

Changing rotatelogs to snarf up all entries from these gracefully dying children
would be nice, but first order of business is to prevent the hang occurring,
since it can occur with any piped logger, and it shouldn't depend on how
reliable the piped logger is.  I can't reproduce the hang when I take care to
close the read side of the logger pipe in the MPM children (i.e., the processes
which serve requests and which would write to the logger pipe).

Comment 6 Jeff Trawick 2005-05-15 15:24:10 UTC

I just posted a patch and further comments to dev@httpd.

[PATCH] PR 26467 child process can hang when piped logger goes away

Comment 7 Jeff Trawick 2005-05-16 02:48:45 UTC

A fix to avoid the hang has been committed to 2.1-dev and hopefully will be in
2.0.next.

I stuck a patch for 1.3.x at http://people.apache.org/~trawick/pr26467_13.txt

This 1.3 patch appears to Do The Right Thing, but YYMV.  I haven't tested it in
detail.

Comment 8 Kevin Stange 2005-05-19 05:56:27 UTC

I have patched the version of Apache 1.3 I am running on two web servers that
use piped loggers and I can already see that the symptoms of this problem are
gone.  There are no hanging PIDs being leftover at all anymore.  I had a script
that went through and killed the ones that were clearly hanging previously, and
I've modified it to only tell me if any would be killed for hanging.  None have
come up yet, which would certainly never happen in the past even for the short
time I've been running with the patch.

This is certainly not proof that the 1.3 patch is perfect or that it doesn't
cause any other problems, but it definitely does the right thing as best I can tell.

What's the chances something like this could land in 1.3.34?

Comment 9 Jeff Trawick 2005-05-19 12:10:18 UTC

Regarding the 1.3 patch:
Thanks for the feedback.  I need to make sure it does the right thing on
non-Unix, then I'll add it to 1.3 STATUS file, asking for approval to commit for
1.3.next.
On Unix, just looking at the processes with lsof it is easy to see that it is
doing the right thing and there should be no worries.

Comment 10 Jeff Trawick 2005-06-28 14:00:47 UTC

The 2.x fix has now been merged into the stable 2.0.x branch for the upcoming
2.0.55 release.

Comment 11 Joe Orton 2005-09-16 09:36:45 UTC

The hang is now fixed for 2.0.55, so marking this fixed.  Avoiding the problem
completely (i.e. not forcibly killing piped loggers at restart) is a slightly
different issue.

Comment 12 Matthew Sullivan 2006-01-11 16:03:54 UTC

Can someone clarify this bug for me?  We'd been having a hang about every 2-14 
days that started once we turned on the DisableWin32AcceptEx option.  We 
noticed that the hang occurred after the child process restarted for whatever 
reason.  Usually this was not triggered by a graceful restart, but 
occasionally it was.  Meaning it would show up in the logs, but a person 
didn't usually choose to do a restart, something else caused it.

We figured the build up of rotatelogs had something to do with the problem, as 
there were a ton of rotatelogs.exe sitting in there.  I noticed this bug 
mentioned in the 2.0.55 release and tried the upgrade.  We've been pretty 
stable since.  It hung once during a graceful restart after 29 days, and has 
been up for 26 days since.

I'm confident this bug was causing the hang, but still notice a tremendous 
number of rotatelogs.exe.  After a fresh reboot, there are about 6 
rotatelogs.exe is the process list.  After 26 days on Parent Server Generation 
14, we have 40+ rotatelogs.exe sitting in the process list.

When we hang, there's no way to recover without rebooting Windows.  At least 
not that I'm aware of.  Apache seems to hang, tying up the socket(s), and 
stopping and starting apache fails.

My concern is the number of accumulating rotatelogs, and the fact that we 
still hung after a month under similar circumstances during a graceful 
restart.  Of course once a month is better than every couple of days.  I just 
wanted to share what we have been experiencing with this bug, and it's 
apparent patch, and see if anyone could clarify what we are experiencing, if 
it is/was related to this bug, and why we still see so many rotatelogs.

Comment 13 Joe Orton 2006-07-14 08:35:13 UTC

*** Bug 40041 has been marked as a duplicate of this bug. ***