Hi, We have many web servers and they are quite busy. Just to get the idea, it's not uncommon to see the size of access_log growing more over than 15 Gb per day. We often change the configuration of virtual host and since we don't want to stop abruptly, we reload the httpd.conf by doing a graceful. So, before going further about the issue we face every day, there is some lines of the configuration file (httpd.conf): LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %v" combined CustomLog "|/path/apache/bin/rotatelogs /var/log/access_log.%Y%m%d-%H%M 3600 -300" combined We want to rotate the logs every hour to avoid to have some files who exceeds 1 GB in size. When we do a graceful, it seems that the 'rotatelogs' program is killed first. Thus, the httpd process can't write to the pipe and then stay stucked in the background indefinitely. If we do 5-6 graceful in the same day, we can see more than 1000 process who are stucked in the background. There is the listing of httpd process provided by the command 'ps'. take note that today is 13 July 2006 and check the date on 12nd column and the state on the 11nd column: [root@xxxxx home]# ps -efl |grep pipe_w |more 5 S xxxxx 18511 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd 5 S xxxxx 18563 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd 5 S xxxxx 18597 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd 5 S xxxxx 18843 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd 5 S xxxxx 19657 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd 5 S xxxxx 19664 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd 5 S xxxxx 20426 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd 5 S xxxxx 21355 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd 5 S xxxxx 21377 18500 0 75 0 - 3477 pipe_w Jun05 ? 00:00:00 /xxxx/apache_1.3.34/bin/httpd [...] We used 'strace' command on a stucked httpd process (pid: 21377 as showed above): [root@xxxx home]# strace -f -v -s2000 -p 21377 Process 21377 attached - interrupt to quit write(8, "xxx.xx.xx.xx - - [05/Jun/2006:15:45:19 -0400] \"POST /xxxxx/xxxxxx/xxxxx HTTP/1.1\" 200 0 \"http://xx.xx.xx.xx/xxxx/xxxx\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)\" xxx.xxx.xxx\n", 209 <unfinished ...> Process 21377 detached [root@xxxx home]# When we used strace, it's stop there and does nothing. It's clear that the httpd process is waiting the other end of the pipe to accept his log. See the difference of the pipe address bewteen the parent (pid: 18500) and this child (pid: 21377): PARENT PROCESS: --------------- [root@xxxxx home]# cd /proc/18500/fd [root@xxxxx fd]# ls -l |grep pipe lr-x------ 1 root root 64 Jul 13 15:23 7 -> pipe:[4210641028] l-wx------ 1 root root 64 Jul 13 15:23 8 -> pipe:[4210641028] CHILD PROCESS: -------------- [root@xxxxx home]# cd /proc/21377/fd [root@xxxxx fd]# ls -l |grep pipe lr-x------ 1 root root 64 Jul 13 15:23 7 -> pipe:[3562989264] l-wx------ 1 root root 64 Jul 13 15:23 8 -> pipe:[3562989264] We think that there is a bad timing between the mod_log and the pipe of the rotatelogs when we do a graceful. It seems that the rotatelogs is killed first and the httpd (the child) seems doesn't handle this well. Our quick and dirty fix is to do a ps with a grep to kill those child process who are hanging for a long time. Thank you
*** Bug 40040 has been marked as a duplicate of this bug. ***
This is fixed in 2.0.55 and later, see Jeff's comment for a 1.3.x patch. *** This bug has been marked as a duplicate of 26467 ***