Bug 40041 - Bad timing between rotatelogs and httpd
Summary: Bad timing between rotatelogs and httpd
Status: RESOLVED DUPLICATE of bug 26467
Alias: None
Product: Apache httpd-1.3
Classification: Unclassified
Component: core (show other bugs)
Version: 1.3.34
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
: 40040 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-07-13 19:33 UTC by Eric LeBlanc
Modified: 2006-07-14 01:35 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eric LeBlanc 2006-07-13 19:33:24 UTC
Hi,

We have many web servers and they are quite busy.  Just to get the idea, it's 
not uncommon to see the size of access_log growing more over than 15 Gb per 
day.

We often change the configuration of virtual host and since we don't want to 
stop abruptly, we reload the httpd.conf by doing a graceful.

So, before going further about the issue we face every day, there is some 
lines 
of the configuration file (httpd.conf):

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %v" 
combined
CustomLog "|/path/apache/bin/rotatelogs /var/log/access_log.%Y%m%d-%H%M 
3600 -300" combined

We want to rotate the logs every hour to avoid to have some files who exceeds 
1 GB in size.

When we do a graceful, it seems that the 'rotatelogs' program is killed first.  
Thus, the httpd process can't write to the pipe and then stay stucked in the 
background indefinitely.  If we do 5-6 graceful in the same day, we can see 
more than 1000 process who are stucked in the background.

There is the listing of httpd process provided by the command 'ps'.  take note 
that today is 13 July 2006 and check the date on 12nd column and the state on 
the 11nd column:

[root@xxxxx home]# ps -efl |grep pipe_w |more
5 S xxxxx    18511 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
5 S xxxxx    18563 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
5 S xxxxx    18597 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
5 S xxxxx    18843 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
5 S xxxxx    19657 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
5 S xxxxx    19664 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
5 S xxxxx    20426 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
5 S xxxxx    21355 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
5 S xxxxx    21377 18500  0  75   0    -  3477 pipe_w Jun05 ?        
00:00:00 /xxxx/apache_1.3.34/bin/httpd
[...]

We used 'strace' command on a stucked httpd process (pid: 21377 as showed 
above):

[root@xxxx home]# strace -f -v -s2000 -p 21377
Process 21377 attached - interrupt to quit
write(8, "xxx.xx.xx.xx - - [05/Jun/2006:15:45:19 -0400] 
\"POST /xxxxx/xxxxxx/xxxxx HTTP/1.1\" 200 0 \"http://xx.xx.xx.xx/xxxx/xxxx\" 
\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)\" xxx.xxx.xxx\n", 
209 <unfinished ...>
Process 21377 detached
[root@xxxx home]#

When we used strace, it's stop there and does nothing.  It's clear that the 
httpd process is waiting the other end of the pipe to accept his log.

See the difference of the pipe address bewteen the parent (pid: 18500) and 
this child (pid: 21377):

PARENT PROCESS:
---------------
[root@xxxxx home]# cd /proc/18500/fd
[root@xxxxx fd]# ls -l |grep pipe
lr-x------    1 root     root           64 Jul 13 15:23 7 -> pipe:[4210641028]
l-wx------    1 root     root           64 Jul 13 15:23 8 -> pipe:[4210641028]

CHILD PROCESS:
--------------
[root@xxxxx home]# cd /proc/21377/fd
[root@xxxxx fd]# ls -l |grep pipe
lr-x------    1 root     root           64 Jul 13 15:23 7 -> pipe:[3562989264]
l-wx------    1 root     root           64 Jul 13 15:23 8 -> pipe:[3562989264]

We think that there is a bad timing between the mod_log and the pipe of the 
rotatelogs when we do a graceful.  It seems that the rotatelogs is killed 
first and the httpd (the child) seems doesn't handle this well.

Our quick and dirty fix is to do a ps with a grep to kill those child process 
who are hanging for a long time.

Thank you
Comment 1 Eric LeBlanc 2006-07-13 19:35:46 UTC
*** Bug 40040 has been marked as a duplicate of this bug. ***
Comment 2 Joe Orton 2006-07-14 08:35:13 UTC
This is fixed in 2.0.55 and later, see Jeff's comment for a 1.3.x patch.

*** This bug has been marked as a duplicate of 26467 ***