Hello ! It seems there is a bug in the memory management of apache withhin a multiprocessor envoirment. First I will introduce my envoirment: I use apache 2.0.47 in default configuration (build with the workspace shipped by apache.org). The bug was reproduced on Windows NT4 with dual Processor, W2K with dual Processor and W2K with Hyperthreading processor. For reproducing the bug I modified the mod_example as followed: In x_handler() I insert a Sleep(1000); for simulating a long operation during request like this: -------------------------------------- ap_rprintf(r, " Apache HTTP Server version: \"%s\"\n", ap_get_server_version ()); ap_rputs(" <BR>\n", r); /* Simulating a long operation */ Sleep(1000); ap_rprintf(r, " Server built: \"%s\"\n", ap_get_server_built()); -------------------------------------- Then I made heavy request to normal URL’s and to mod_example simultaneous. After a while apache will fail at random in two different states. The first state is a access violation. The Log comes as follows [Thu Jun 17 16:45:47 2004] [notice] Parent: child process exited with status 3221225477 -- Restarting. [Thu Jun 17 16:45:50 2004] [notice] Parent: Created child process 3464 [Thu Jun 17 16:45:51 2004] [debug] mpm_winnt.c(505): Parent: Sent the scoreboard to the child [Thu Jun 17 16:45:53 2004] [notice] Child 3464: Child process is running [Thu Jun 17 16:45:53 2004] [info] Parent: Duplicating socket 404 and sending it to child process 3464 [Thu Jun 17 16:45:53 2004] [debug] mpm_winnt.c(426): Child 3464: Retrieved our scoreboard from the parent. [Thu Jun 17 16:45:53 2004] [debug] mpm_winnt.c(623): Parent: Sent 1 listeners to child 3464 [Thu Jun 17 16:45:53 2004] [debug] mpm_winnt.c(582): Child 3464: retrieved 1 listeners from parent [Thu Jun 17 16:45:53 2004] [notice] Child 3464: Acquired the start mutex. [Thu Jun 17 16:45:54 2004] [notice] Child 3464: Starting 25 worker threads. Where status is 0xC0000005 which means Access Violation. The access violation happens in apr_pool_walk_tree() while accessing child = pool->child; At the same time another thread tries to free the pool memory. I have no stack backtrace for this at the moment but I can try to make one, if there is a intrest for it. The other failure is that apache stops responding and stays with a processor load of about 50%. A break into code shows, that the pool is damaged. One thread stays in allocator_free() and tries to free pool memory but the next pointer of the actual node points to itself so there is a recursion where apache never gets out. This happens at this point: do { next = node->next; index = node->index; Where the node has the following content: next = 0x007dadd8 node->index = 1 node->next = 0x007dadd8 node->ref = 0x007dadd8 node->free_index = 3452816845 The Thread backtrace looks like this: allocator_free(apr_allocator_t * 0x00773dd8, apr_memnode_t * 0x007dadd8) line 362 + 6 bytes apr_pool_destroy(apr_pool_t * 0x007d8db8) line 797 + 13 bytes trace_add(server_rec * 0x0077c290, request_rec * 0x00000000, x_cfg * 0x007b7c18, const char * 0x10014908 `string') line 408 + 15 bytes x_insert_filter(request_rec * 0x007d2d48) line 997 + 23 bytes ap_run_insert_filter(request_rec * 0x007d2d48) line 121 + 31 bytes ap_invoke_handler(request_rec * 0x6ff09466) line 374 ap_process_http_connection(conn_rec * 0x6ff03f8f) line 293 + 6 bytes ap_run_process_connection(conn_rec * 0x007f51c8) line 85 + 31 bytes ap_process_connection(conn_rec * 0x007f51c8, void * 0x007f5100) line 211 + 6 bytes worker_main(long 2013300156) line 731 MSVCRT! 780085bc() Another Thread stays in this stack state: NTDLL! 77894091() NTDLL! 778922f8() allocator_alloc(apr_allocator_t * 0x00773dd8, unsigned int 8192) line 242 apr_pool_create_ex(apr_pool_t * * 0x007d51dc, apr_pool_t * 0x007d4d48, int (int) * 0x00000000, apr_allocator_t * 0x00773dd8) line 829 + 14 bytes core_output_filter(ap_filter_t * 0x6ff182c1, apr_bucket_brigade * 0x007d5198) line 4108 ap_pass_brigade(ap_filter_t * 0x007d5198, apr_bucket_brigade * 0x007ed5a0) line 550 + 7 bytes ap_http_header_filter(ap_filter_t * 0x6ff182c1, apr_bucket_brigade * 0x007ab1f0) line 1695 ap_pass_brigade(ap_filter_t * 0x007ab1f0, apr_bucket_brigade * 0x007ed408) line 550 + 7 bytes ap_content_length_filter(ap_filter_t * 0x6ff182c1, apr_bucket_brigade * 0x007ab1d8) line 1252 + 20 bytes ap_pass_brigade(ap_filter_t * 0x007ab1d8, apr_bucket_brigade * 0x007ed408) line 550 + 7 bytes ap_byterange_filter(ap_filter_t * 0x6ff182c1, apr_bucket_brigade * 0x007ab1c0) line 3036 + 5 bytes ap_pass_brigade(ap_filter_t * 0x007ab1c0, apr_bucket_brigade * 0x007ed408) line 550 + 7 bytes ap_old_write_filter(ap_filter_t * 0x007ed3f0, apr_bucket_brigade * 0x007ed528) line 1321 + 10 bytes end_output_stream(request_rec * 0x007aa550) line 1039 + 29 bytes ap_finalize_request_protocol(request_rec * 0x007aa550) line 1061 + 6 bytes ap_send_error_response(request_rec * 0x6ff0d26f, int 404) line 2423 + 6 bytes ap_die(int 1878053609, request_rec * 0x00000000) line 232 + 11 bytes ap_process_request(request_rec * 0x007aa550) line 311 ap_process_http_connection(conn_rec * 0x6ff03f8f) line 293 + 6 bytes ap_run_process_connection(conn_rec * 0x007d4e48) line 85 + 31 bytes ap_process_connection(conn_rec * 0x007d4e48, void * 0x007d4d80) line 211 + 6 bytes worker_main(long 2013300156) line 731 MSV and another Thread looks like this: allocator_alloc(apr_allocator_t * 0x00773dd8, unsigned int 8192) line 242 apr_pool_create_ex(apr_pool_t * * 0x0156ff28, apr_pool_t * 0x007d6d80, int (int) * 0x00000000, apr_allocator_t * 0x00773dd8) line 829 + 14 bytes ap_read_request(conn_rec * 0x6ff09431) line 848 ap_process_http_connection(conn_rec * 0x6ff03f8f) line 286 + 6 bytes ap_run_process_connection(conn_rec * 0x007d6e80) line 85 + 31 bytes ap_process_connection(conn_rec * 0x007d6e80, void * 0x007d6db8) line 211 + 6 bytes worker_main(long 2013300156) line 731 MSVCRT! 780085bc() all other threads stays in winnt_get_connection() So in my oppinion, there is a race condition where one thread tries to free the pool memory and another thread tries to access this memory at the same time. It seems that this behavior can only be reproduced on a multiprocessor or hyperthreading machine. I don’t know if it affects to other OS then Windows. Greetings Gabriel Kalkuhl
Hello, I am also having the same issue. If i change the ThreadsPerChild then i am able to vary how long the server will run before restarting. While it restarts it fails to serve any pages and returns error in browser. System ------------------- Dell Dual P3 1gz Win2k with all updates as of 10/25/2004 Apache 2.0.52 OpenSSL 0.9.7d PHP (tried many version and all have same issue from 4.3.2 - 5.0.2 and 5.1.x CVS) Any advice or help in focusing this issue would be very helpfull. Thank You, David [Sun Oct 24 11:52:37 2004] [notice] Parent: child process exited with status 1 -- Restarting. [Sun Oct 24 11:52:40 2004] [notice] Parent: Created child process 4060 [Sun Oct 24 11:52:40 2004] [notice] Disabled use of AcceptEx() WinSock2 API [Sun Oct 24 11:52:43 2004] [notice] Child 4060: Child process is running [Sun Oct 24 11:52:43 2004] [notice] Child 4060: Acquired the start mutex. [Sun Oct 24 11:52:43 2004] [notice] Child 4060: Starting 1920 worker threads. [Sun Oct 24 11:52:44 2004] [notice] Child 4060: Listening on port 443. [Sun Oct 24 11:52:44 2004] [notice] Child 4060: Listening on port 80. [Sun Oct 24 11:52:44 2004] [notice] Child 4060: Listening on port 80. [Sun Oct 24 11:58:17 2004] [notice] Parent: child process exited with status 1 -- Restarting. [Sun Oct 24 11:58:20 2004] [notice] Parent: Created child process 10080 [Sun Oct 24 11:58:20 2004] [notice] Disabled use of AcceptEx() WinSock2 API [Sun Oct 24 11:58:23 2004] [notice] Child 10080: Child process is running [Sun Oct 24 11:58:23 2004] [notice] Child 10080: Acquired the start mutex. [Sun Oct 24 11:58:23 2004] [notice] Child 10080: Starting 1920 worker threads. [Sun Oct 24 11:58:23 2004] [notice] Child 10080: Listening on port 443. [Sun Oct 24 11:58:23 2004] [notice] Child 10080: Listening on port 80. [Sun Oct 24 11:58:23 2004] [notice] Child 10080: Listening on port 80.
Hello David! I think many of the listed PHP errors for Apache 2 depends on this Bug. I wonder why Apache team members ignore this bug! However for fixing it, you have to change the memory management to protect the pool memory. I've not the time to do this and it's a bit dangerous because of race conditions and dead lock states. This bug should be fixed by someone who created this pool management. The Bug fix for me was to completely switch back to Apache 1. Sorry, not really helpful Gabriel
apr_pool_destroy(apr_pool_t * 0x007d8db8) line 797 + 13 bytes trace_add(server_rec * 0x0077c290, request_rec * 0x00000000, x_cfg *...) "Then I made heavy request to normal URL’s and to mod_example simultaneous." it looks very much like mod_example is not thread-safe; so this may just be a good demonstration of that. Do you have an example of a crash using any module other than mod_example?
I agree with switching back to apache 1. I'm currently in the process of compiling the latest stable version of apache 1 with ssl after getting a few emails from some colleagues of mine with the same issue. It is unfortunate that this issue is so ingrained into the apache2 version. It may just be my imagination, but I believe that the apache team is focusing on apache2. If this is true then for people like us to upgrade this is a mandatory fix, I just can’t have customers getting errors every 5.5 minutes. Thnx for the fast response and I would love to know if you or the Apache team ever decides this is really an issue they would like to fix. - David
All that the "Child process exited with status 3221225477 -- Restarting." messages means is that a child process crashed. This may be because of a bug in a module, or it may be because of a bug in httpd -- it's impossible to tell without a backtrace. If you can either: a) get a simple reproduction case for the problem, or b) get a backtrace from a crash there is a chance of looking at this further.
I concur with Joe's comment that mod_example is not thread-safe. (Any threaded Apache with mod_example should be expected to crash.) As he says, we need a recipe to reproduce, or other crash data, from a configuration without mod_example .
Hello Joe, yes, the error appeared to me first by using mod_cplusplus with EasySoap++. I implemented a SOAP server with this modules. My module worked fine for some time. On the first dual processor machine we installed this software it crashed. After some reviews in the cplusplus module and EasySoap++ I found that the error appers in the apache pool management. For better demonstration I implemented the delay in mod_example. It will cras as soon as you make simultanous requests to normal http service and mod_example. By the way, I saw some other bug reports from poeple using PHP with the same behavior (see bug 24095). Are there some restrictions on making 'long' operations during request handling? Gabriel
Sorry, comments where made at the same time. Sorry Jeff, but I'm a bit confused about hearing that mod_example isn't thread safe. Mod_example doesn't do anything like piping some text to the output. The only modification I've done is to insert a sleep between the output lines so the module behaves like doing a slow database acces (which originaly was my intention). Mod_example doesn't crash if I remove the sleep. If this is not thread safe, then all modules which generate some text output aren't thread safe. Gabriel
Look at mod_example's trace_add function and how it stores pointers to pools in global variables - it is not a well-behaved example module at all! The first thing to do must be to take mod_example completely out of the picture, and try and trigger whatever problem it is you're seeing. Yes, PHP users report crashes too, sometimes these are PHP bugs and sometimes not.
Bug #32051 is a tracking bug for mod_example not being thread safe. This bug is invalid because the issue comes from a user basing their module on mod_example.
Your right! I removed all the trace_add stuff in mod_example and it works fine. No more crashes anymore! By the way, I'm a bit dissapointed about mod_example. Normaly example means, that someone wíll show how to do something. In this case mod_example should be titled: "Here we show you, how to write bad code and how you shouldn't write modules". Now I can go on searching on mod_cplusplus, which is only third party for apache. Thanks for spending time on my mistakes! Gabriel
I've implemented the one-minute fix you suggested, by adding a warning to mod_example http://cvs.apache.org/viewcvs.cgi/httpd-2.0/modules/experimental/mod_example.c