I am running specweb99 static content workload with httpd 2.2.6 on Solaris nevada (snv_79). I am seeing several crashes. Typically crash do reproduce in 10 minutes. Here are the details : Apache version : httpd-2.2.6 Simultaneous connection : 1000 Hardware : X4100 Server (4 core 2.8 GHz) CPU : Only single core is enabled Architecture : x86_64 httpd.conf contains : <IfModule worker.c> ListenBackLog 50000 StartServers 2 ThreadLimit 500 ThreadsPerChild 500 MinSpareThreads 100 MaxSpareThreads 100 ThreadsPerChild 500 MaxClients 1000 MaxRequestsPerChild 0 </IfModule> Listen 192.168.21.1:80 Listen 192.168.22.1:80 Here is the most common stack trace. Configure option : CFLAGS="-g -mt -m64 -KPIC " ./configure --prefix=<prefix_path> --with-mpm=worker --enable-modules=all --with-ssl=/usr/sfw --enable-mods-shared=all --enable-cgi --enable-threads && gmake && gmake install Crash 1 : (dbx) where current thread: t@76 =>[1] allocator_free(allocator = 0x101f870, node = (nil)), line 331 in "apr_pools.c" [2] apr_pool_clear(pool = 0x102fb88), line 710 in "apr_pools.c" [3] ap_core_output_filter(f = 0x1020550, b = 0x101f9e8), line 899 in "core_filters.c" [4] ap_pass_brigade(next = 0x1020550, bb = 0x101f9e8), line 526 in "util_filter.c" [5] logio_out_filter(f = 0x10204e0, bb = 0x101f9e8), line 135 in "mod_logio.c" [6] ap_pass_brigade(next = 0x10204e0, bb = 0x101f9e8), line 526 in "util_filter.c" [7] ap_flush_conn(c = 0x101fd00), line 84 in "connection.c" [8] ap_lingering_close(c = 0x101fd00), line 123 in "connection.c" [9] process_socket(p = 0x101f968, sock = 0x101f9e8, my_child_num = 1, my_thread_num = 227, bucket_alloc = 0x1029a88), line 545 in "worker.c" [10] worker_thread(thd = 0x5bed38, dummy = 0x6dbac0), line 894 in "worker.c" [11] dummy_worker(opaque = 0x5bed38), line 142 in "thread.c" [12] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7 [13] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0 Crash 2 : (dbx) where current thread: t@363 =>[1] apr_palloc(pool = 0x21680007952225ff, size = 18446744073323675656U), line 601 in "apr_pools.c" [2] apr_sockaddr_ip_get(addr = 0xcda3d0, sockaddr = 0x42d790), line 104 in "sockaddr.c" [3] core_create_conn(ptrans = 0xcda2d8, server = 0x4bf600, csd = 0xcda358, id = 360, sbh = 0xcda378, alloc = 0xd147e8), line 3895 in "core.c" [4] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x45fe03 [5] process_socket(p = 0xcda2d8, sock = 0xcda358, my_child_num = 0, my_thread_num = 360, bucket_alloc = 0xd147e8), line 542 in "worker.c" [6] worker_thread(thd = 0x7192f8, dummy = 0x7e45a0), line 894 in "worker.c" [7] dummy_worker(opaque = 0x7192f8), line 142 in "thread.c" [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7 [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0 I tried httpd-2.2.8 too but got the similar crash : Crash 3 (with httpd-2.2.8): =>[1] apr_palloc(pool = 0x226800079e7a25ff, size = 18446744073323675656U), line 630 in "apr_pools.c" [2] apr_sockaddr_ip_get(addr = 0xc57060, sockaddr = 0x42dab8), line 104 in "sockaddr.c" [3] core_create_conn(ptrans = 0xc56f68, server = 0x4c0378, csd = 0xc56fe8, id = 951, sbh = 0xc57008, alloc = 0xc58f78), line 3895 in "core.c" [4] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4604e3 [5] process_socket(p = 0xc56f68, sock = 0xc56fe8, my_child_num = 1, my_thread_num = 451, bucket_alloc = 0xc58f78), line 542 in "worker.c" [6] worker_thread(thd = 0x870c88, dummy = 0x7e7e30), line 894 in "worker.c" [7] dummy_worker(opaque = 0x870c88), line 142 in "thread.c" [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7 [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0 prefork mpm works just fine.
I tried to debug the crash =>[1] allocator_free(allocator = 0x101f870, node = (nil)), line 331 in "apr_pools.c" [2] apr_pool_clear(pool = 0x102fb88), line 710 in "apr_pools.c" [3] ap_core_output_filter(f = 0x1020550, b = 0x101f9e8), line 899 in "core_filters.c" In ap_core_output_filter, crash is happening when apr_pool_clear is called for deferred_write_pool. apr_pool_clear(ctx->deferred_write_pool); On further investigation, I found that for ctx->deferred_write_pool, pool->ref points to pool->next i.e pool->ref == &pool->next Thus in apr_pool_clear : if (active->next == active) return; *active->ref = NULL; // ---> this cause active->next to set to NULL because // active->ref points to active->next allocator_free(pool->allocator, active->next); The situation doesn't arrive on normal single connection situation. This happens only under stress. Under normal connection active->next == active and function returns from apr_pool_clear (when called for ctx->deferred_write_pool)
Crashes are happening on 32 bit apache too therefore changing the summary.
Here is the crash from 32 bit apache : =>[1] allocator_free(allocator = 0x8aae018, node = (nil)), line 331 in "apr_pools.c" [2] apr_pool_clear(pool = 0x8b629b8), line 710 in "apr_pools.c" [3] ap_core_output_filter(f = 0x8aae870, b = 0x8aae0e0), line 899 in "core_filters.c" [4] ap_pass_brigade(next = 0x8aae870, bb = 0x8aae0e0), line 526 in "util_filter.c" [5] logio_out_filter(f = 0x8aae830, bb = 0x8aae0e0), line 135 in "mod_logio.c" [6] ap_pass_brigade(next = 0x8aae830, bb = 0x8aae0e0), line 526 in "util_filter.c" [7] ap_flush_conn(c = 0x8aae390), line 84 in "connection.c" [8] ap_lingering_close(c = 0x8aae390), line 123 in "connection.c" [9] process_socket(p = 0x8aae0a0, sock = 0x8aae0e0, my_child_num = 1, my_thread_num = 249, bucket_alloc = 0x8b5c9a0), line 545 in "worker.c" [10] worker_thread(thd = 0x81a6788, dummy = 0x831f5a0), line 894 in "worker.c" [11] dummy_worker(opaque = 0x81a6788), line 142 in "thread.c" [12] _thr_setup(0xf004d200), at 0xfec6f282 [13] _lwp_start(0xfee7ddb9, 0xfee7f55e, 0xfffffff6, 0x0, 0x1, 0xfeea1984), at 0xfec6f4e0
Here is the debug information from a crash of 32 bit apache : t@414 (l@414) terminated by signal SEGV (Segmentation Fault) Current function is apr_sockaddr_ip_get 104 *addr = apr_palloc(sockaddr->pool, sockaddr->addr_str_len); (dbx) where current thread: t@414 =>[1] apr_sockaddr_ip_get(addr = 0x974a3d0, sockaddr = (nil)), line 104 in "sockaddr.c" [2] core_create_conn(ptrans = 0x974a348, server = 0x80d9020, csd = 0x974a388, id = 411, sbh = 0x974a398, alloc = 0x9788670), line 3895 in "core.c" [3] ap_run_create_connection(0x974a348, 0x80d9020, 0x974a388, 0x19b, 0x974a398, 0x9788670), at 0x8090ae8 [4] process_socket(p = 0x974a348, sock = 0x974a388, my_child_num = 0, my_thread_num = 411, bucket_alloc = 0x9788670), line 542 in "worker.c" [5] worker_thread(thd = 0x83a6ff8, dummy = 0x8125e80), line 894 in "worker.c" [6] dummy_worker(opaque = 0x83a6ff8), line 142 in "thread.c" [7] _thr_setup(0xf008e200), at 0xfec6f282 [8] _lwp_start(0x0, 0xfee8410c, 0xe451bef8, 0xe451bef8, 0x8081a83, 0x974a3d0), at 0xfec6f4e0 (dbx) p sockaddr sockaddr = (nil) (dbx) where current thread: t@414 =>[1] apr_sockaddr_ip_get(addr = 0x974a3d0, sockaddr = (nil)), line 104 in "sockaddr.c" [2] core_create_conn(ptrans = 0x974a348, server = 0x80d9020, csd = 0x974a388, id = 411, sbh = 0x974a398, alloc = 0x9788670), line 3895 in "core.c" [3] ap_run_create_connection(0x974a348, 0x80d9020, 0x974a388, 0x19b, 0x974a398, 0x9788670), at 0x8090ae8 [4] process_socket(p = 0x974a348, sock = 0x974a388, my_child_num = 0, my_thread_num = 411, bucket_alloc = 0x9788670), line 542 in "worker.c" [5] worker_thread(thd = 0x83a6ff8, dummy = 0x8125e80), line 894 in "worker.c" [6] dummy_worker(opaque = 0x83a6ff8), line 142 in "thread.c" [7] _thr_setup(0xf008e200), at 0xfec6f282 [8] _lwp_start(0x0, 0xfee8410c, 0xe451bef8, 0xe451bef8, 0x8081a83, 0x974a3d0), at 0xfec6f4e0 (dbx) up Current function is core_create_conn 3895 apr_sockaddr_ip_get(&c->local_ip, c->local_addr); (dbx) p *c *c = { pool = 0x974a348 base_server = (nil) vhost_lookup_data = (nil) local_addr = (nil) remote_addr = (nil) remote_ip = (nil) remote_host = (nil) remote_logname = (nil) aborted = 0 keepalive = AP_CONN_UNKNOWN double_reverse = 0 keepalives = 0 local_ip = (nil) local_host = (nil) id = 0 conn_config = 0x974a400 notes = 0x974a6a0 input_filters = (nil) output_filters = (nil) sbh = 0x974a398 bucket_alloc = (nil) cs = (nil) data_in_input_filters = 0 } (dbx) dump alloc = 0x9788670 rv = 0 ptrans = 0x974a348 server = 0x80d9020 sbh = 0x974a398 c = 0x974a3a0 id = 411 csd = 0x974a388 (dbx) _arch_networkio.h`struct apr_socket_t*)csd < *((struct apr_socket_t *) csd) = { pool = (nil) socketdes = 158893680 type = -17726080 protocol = 134660748 local_addr = (nil) remote_addr = 0x19b timeout = 158638920LL local_port_unknown = 0 local_interface_unknown = 0 remote_addr_unknown = 0 options = 0 inherit = 0 userdata = (nil) } Please let me know if any other information is need.
First guess for your last crash in comment #4 (all line numbers 2.2.8): lr->accept_func(&csd, lr, ptrans); (line 742 in worker.c) fails with rv != APR_SUCCESS, but with a non NULL value for csd. In contrast to the code in prefork we don't check this situation: Lines 621 - 631 of prefork.c: status = lr->accept_func(&csd, lr, ptrans); SAFE_ACCEPT(accept_mutex_off()); /* unlock after "accept" */ if (status == APR_EGENERAL) { /* resource shortage or should-not-occur occured */ clean_child_exit(1); } else if (status != APR_SUCCESS) { continue; } Maybe we need to do a continue in the worker case as well or we need to do something like the following: Index: server/mpm/worker/worker.c =================================================================== --- server/mpm/worker/worker.c (Revision 627576) +++ server/mpm/worker/worker.c (Arbeitskopie) @@ -743,6 +743,9 @@ /* later we trash rv and rely on csd to indicate success/failure */ AP_DEBUG_ASSERT(rv == APR_SUCCESS || !csd); + if (rv != APR_SUCCESS) { + csd = NULL; + } if (rv == APR_EGENERAL) { /* E[NM]FILE, ENOMEM, etc */ resource_shortage = 1;
I did the stress test with the patch you suggested. After your patch, I still got the 1st crash. If it crashed in second stack trace then I will update the bug. Here are some more information about 1st crash. * I am able to reproduce the crash on Solaris 10 update 1 (on a different machine) too. It took around 4 hours of stress before I got the crash on Solaris 10 while it takes around < 30 minutes to reproduce on Solaris nevada. It was crash 1 (allocator_free with node = null) (without your patch). Here is more information of the crash 1 : (dbx) where current thread: t@21 =>[1] allocator_free(allocator = 0x8afe2e0, node = (nil)), line 331 in "apr_pools.c" [2] apr_pool_clear(pool = 0xa0d01c0), line 710 in "apr_pools.c" [3] ap_core_output_filter(f = 0xa0b28c8, b = 0xa0b2a08), line 899 in "core_filters.c" [4] ap_pass_brigade(next = 0xa0b28c8, bb = 0xa0b2a08), line 526 in "util_filter.c" [5] logio_out_filter(f = 0xa0b2888, bb = 0xa0b2a08), line 135 in "mod_logio.c" [6] ap_pass_brigade(next = 0xa0b2888, bb = 0xa0b2a08), line 526 in "util_filter.c" [7] ap_flush_conn(c = 0xa0b23e8), line 84 in "connection.c" [8] ap_lingering_close(c = 0xa0b23e8), line 123 in "connection.c" [9] process_socket(p = 0x8afe368, sock = 0x8aff660, my_child_num = 1, my_thread_num = 18, bucket_alloc = 0xa0be178), line 545 in "worker.c" [10] worker_thread(thd = 0x81487d8, dummy = 0x8117b30), line 894 in "worker.c" [11] dummy_worker(opaque = 0x81487d8), line 142 in "thread.c" [12] _thr_setup(0xfe244800), at 0xfeccf92e [13] _lwp_start(), at 0xfeccfc10 (dbx) up Current function is apr_pool_clear 710 allocator_free(pool->allocator, active->next); (dbx) p *active *active = { next = (nil) ref = 0xa0d01a8 index = 1U free_index = 0 first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n" endp = 0xa0d21a8 "^A " } (dbx) up Current function is ap_core_output_filter 899 apr_pool_clear(ctx->deferred_write_pool); (dbx) p *ctx *ctx = { b = (nil) deferred_write_pool = 0xa0d01c0 } (dbx) p *ctx->deferred_write_pool *ctx->deferred_write_pool = { parent = 0x8afe368 child = (nil) sibling = 0xa0c6198 ref = 0x8afe36c cleanups = (nil) free_cleanups = (nil) allocator = 0x8afe2e0 subprocesses = (nil) abort_fn = (nil) user_data = (nil) tag = 0x80bfd1c "deferred_write" active = 0xa0d01a8 self = 0xa0d01a8 self_first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n" } (dbx) p *c *c = { pool = 0x8afe368 base_server = 0x80e6bf8 vhost_lookup_data = (nil) local_addr = 0x8aff698 remote_addr = 0x8aff7c0 remote_ip = 0xa0b2850 "192.168.11.1" remote_host = (nil) remote_logname = (nil) aborted = 0 keepalive = AP_CONN_KEEPALIVE double_reverse = 0 keepalives = 1 local_ip = 0xa0b2840 "192.168.11.2" local_host = (nil) id = 518 conn_config = 0xa0b2448 notes = 0xa0b26e8 input_filters = 0xa0b2870 output_filters = 0xa0b2888 sbh = 0xa0b23e0 bucket_alloc = 0xa0be178 cs = (nil) data_in_input_filters = 0 } One putting some printfs I figured out the following : In apr_pool_clear (when invoked for deferred_write_pool) ... active = pool->active = pool->self; active->first_avail = pool->self_first_avail; if (active->next == active) return; active->next should typically be s circular link list. What is happenning some cases is that active->next points to some thing else and active->ref still points to active->next. I put a printf of active->next before it is set to NULL. For a particular crash, here is my debugging session. I found that active->next was set to 0x20e8810 before it was set to NULL. (dbx) up Current function is apr_pool_clear 774 allocator_free(pool->allocator, active->next); (dbx) up Current function is ap_core_output_filter 923 apr_pool_clear(ctx->deferred_write_pool); (dbx) p (struct apr_memnode_t*)0x20e8810 -----> This was active->next before set to NULL. (struct apr_memnode_t *) 0x20e8810 = 0x20e8810 (dbx) p *(struct apr_memnode_t*)0x20e8810 *((struct apr_memnode_t *) 0x20e8810) = { next = 0x288c5b0 ref = 0x20e8810 index = 1U free_index = 0 first_avail = 0x20e9eb0 "GET /file_set/dir00104/class1_3 HTTP/1.0" endp = 0x20ea810 "^A " } (dbx) down Current function is apr_pool_clear 774 allocator_free(pool->allocator, active->next); (dbx) p active active = 0x20e27e0 (dbx) p *((struct apr_memnode_t*)0x20e8810)->next *((struct apr_memnode_t *) 0x20e8810)->next = { next = 0x20e07d0 ref = 0x20e07d0 index = 1U free_index = 0 first_avail = 0x288d008 "" endp = 0x288e5b0 "^A " } (dbx) p active active = 0x20e27e0 (dbx) p *(((struct apr_memnode_t*)0x20e8810)->next)->next *((struct apr_memnode_t *) 0x20e8810)->next->next = { next = 0x28905d0 ref = 0x288c5b0 index = 1U free_index = 0 first_avail = 0x20e2738 "" endp = 0x20e27d0 "^A " } (dbx) p *((((struct apr_memnode_t*)0x20e8810)->next)->next)->next *((struct apr_memnode_t *) 0x20e8810)->next->next->next = { next = 0x288e5c0 ref = 0x28905d0 index = 1U free_index = 0 first_avail = 0x2890668 "\xf8^E\x89^B" endp = 0x28925d0 "^Q^P" } (dbx) p *(((((struct apr_memnode_t*)0x20e8810)->next)->next)->next)->next *((struct apr_memnode_t *) 0x20e8810)->next->next->next->next = { next = (nil) ref = (nil) index = 1U free_index = 0 first_avail = 0x288e5e8 "`^_" endp = 0x28905c0 "^A " } On further debugging, I figured out that typically ap_core_output_filter is called 4 times for a request. The crash always happen in 4th invocation. It seems to me that it gets corrupted somewhere after the 3rd invocation (after it returns from ap_core_output_filter) and before it enters into ap_core_output_filter 4th time (when ap_lingering_close is in call stack). Also conn->keepalives was always set to 1.
With Ruediger patch, I still got the crash (#2). Here is the debug information : t@314 (l@314) terminated by signal SEGV (Segmentation Fault) Current function is apr_sockaddr_ip_get 104 *addr = apr_palloc(sockaddr->pool, sockaddr->addr_str_len); (dbx) where current thread: t@314 =>[1] apr_sockaddr_ip_get(addr = 0x1ebb4b0, sockaddr = (nil)), line 104 in "sockaddr.c" [2] core_create_conn(ptrans = 0x1ebb3b8, server = 0x4c0200, csd = 0x1ebb728, id = 311, sbh = 0x1ebb458, alloc = 0x2128b58), line 3895 in "core.c" [3] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4602c3 [4] process_socket(p = 0x1ebb3b8, sock = 0x1ebb728, my_child_num = 0, my_thread_num = 311, bucket_alloc = 0x2128b58), line 566 in "worker.c" [5] worker_thread(thd = 0x7195c8, dummy = 0x6e2310), line 923 in "worker.c" [6] dummy_worker(opaque = 0x7195c8), line 142 in "thread.c" [7] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7 [8] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0 (dbx) p *addr *addr = (nil) (dbx) up Current function is core_create_conn 3895 apr_sockaddr_ip_get(&c->local_ip, c->local_addr); (dbx) p *c *c = { pool = 0x1ebb3b8 base_server = (nil) vhost_lookup_data = (nil) local_addr = (nil) remote_addr = (nil) remote_ip = (nil) remote_host = (nil) remote_logname = (nil) aborted = 0 keepalive = AP_CONN_UNKNOWN double_reverse = 0 keepalives = 0 local_ip = (nil) local_host = (nil) id = 0 conn_config = 0x1ebb508 notes = 0x1ebba48 input_filters = (nil) output_filters = (nil) sbh = 0x1ebb458 bucket_alloc = (nil) cs = (nil) data_in_input_filters = 0 } (dbx) dump alloc = 0x2128b58 rv = 0 ptrans = 0x1ebb3b8 server = 0x4c0200 sbh = 0x1ebb458 c = 0x1ebb460 id = 311 csd = 0x1ebb728 (dbx) p csd csd = 0x1ebb728 (dbx) p *(struct apr_socket_t*) csd *((struct apr_socket_t *) csd) = { pool = (nil) socketdes = 0 type = 0 protocol = 0 local_addr = (nil) remote_addr = (nil) timeout = 0 local_port_unknown = 0 local_interface_unknown = 0 remote_addr_unknown = 0 options = 0 inherit = 0 userdata = (nil) }
I assume that the ptrans pool somehow gets corrupted. I guess it is used by two threads in parallel which could lead to a corruption since pools as such are not thread safe. So I think a good starting point for further investigations would be ap_queue_info_wait_for_idler in mpm/worker/fdqueue.c or the lines 731 - 740 in worker.c: if (ptrans == NULL) { /* we can't use a recycled transaction pool this time. * create a new transaction pool */ apr_allocator_t *allocator; apr_allocator_create(&allocator); apr_allocator_max_free_set(allocator, ap_max_mem_free); apr_pool_create_ex(&ptrans, pconf, NULL, allocator); apr_allocator_owner_set(allocator, ptrans); }
Thanks Ruediger for your suggestion. I will try to explore based on your suggestion. Meanwhile here is the 3rd type of crash (with your patch). t@13 (l@13) terminated by signal SEGV (Segmentation Fault) Current function is apr_pool_cleanup_kill 2045 c = p->cleanups; (dbx) where current thread: t@13 =>[1] apr_pool_cleanup_kill(p = 0xa0, data = 0x195b888, cleanup_fn = 0xfffffd7fff223540 = &`libapr-1.so.0.2.11`sockets.c`socket_cleanup(void *sock)), line 2045 in "apr_pools.c" [2] apr_pool_cleanup_run(p = 0xa0, data = 0x195b888, cleanup_fn = 0xfffffd7fff223540 = &`libapr-1.so.0.2.11`sockets.c`socket_cleanup(void *sock)), line 2088 in "apr_pools.c" [3] apr_socket_close(thesocket = 0x195b888), line 149 in "sockets.c" [4] ap_lingering_close(c = 0x17407f0), line 135 in "connection.c" [5] process_socket(p = 0x1740748, sock = 0x195b888, my_child_num = 1, my_thread_num = 10, bucket_alloc = 0x195b728), line 569 in "worker.c" [6] worker_thread(thd = 0x52bb48, dummy = 0x4f3480), line 951 in "worker.c" [7] dummy_worker(opaque = 0x52bb48), line 142 in "thread.c" [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7 [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0 (dbx) up Current function is apr_pool_cleanup_run 2088 apr_pool_cleanup_kill(p, data, cleanup_fn); (dbx) up Current function is apr_socket_close 149 return apr_pool_cleanup_run(thesocket->pool, thesocket, socket_cleanup); (dbx) p *thesocket *thesocket = { pool = 0xa0 socketdes = 26588968 type = 0 protocol = 26588928 local_addr = 0x195b7e8 remote_addr = 0x1741158 timeout = 24383832 local_port_unknown = 4895848 local_interface_unknown = 0 remote_addr_unknown = 0 options = 0 inherit = 0 userdata = (nil) } (dbx) dump thesocket = 0x195b888 (dbx) up Current function is ap_lingering_close 135 apr_socket_close(csd); (dbx) dump timeup = 0 dummybuf = "" c = 0x17407f0 nbytes = 4294967296U csd = 0x195b888 (dbx) p *c *c = { pool = 0x1740748 base_server = 0x4c0300 vhost_lookup_data = (nil) local_addr = 0x195b8d8 remote_addr = 0x195ba18 remote_ip = 0x1740f88 "192.168.22.2" remote_host = (nil) remote_logname = (nil) aborted = 0 keepalive = AP_CONN_UNKNOWN double_reverse = 0 keepalives = 0 local_ip = 0x1740f78 "192.168.22.1" local_host = (nil) id = 510 conn_config = 0x1740898 notes = 0x1740dd8 input_filters = 0x1740fa8 output_filters = 0x1740fd0 sbh = 0x17407e8 bucket_alloc = 0x195b728 cs = (nil) data_in_input_filters = 0 } (dbx) _arch_networkio.h`struct apr_socket_t*)csd < *((struct apr_socket_t *) csd) = { pool = 0xa0 socketdes = 26588968 type = 0 protocol = 26588928 local_addr = 0x195b7e8 remote_addr = 0x1741158 timeout = 24383832 local_port_unknown = 4895848 local_interface_unknown = 0 remote_addr_unknown = 0 options = 0 inherit = 0 userdata = (nil) } (dbx) etworkio.h`struct apr_socket_t*)csd->local_addr < dbx: can't find field "local_addr" in "*(csd)" (dbx) p ((`srclib/apr/include/arch/unix/apr_arch_networkio.h`struct apr_socke > ((struct apr_socket_t *) csd)->local_addr = 0x195b7e8 (dbx) p *((`srclib/apr/include/arch/unix/apr_arch_networkio.h`struct apr_sock > *((struct apr_socket_t *) csd)->local_addr = { pool = 0xa0 hostname = 0x195b728 "H^Gt^A" servname = 0x195b700 "" port = 0 family = 0 salen = 24383744U ipaddr_len = 0 addr_str_len = 24383744 ipaddr_ptr = 0xfffffd7fff2e8010 next = (nil) sa = { sin = { sin_family = 0 sin_port = 0 sin_addr = { S_un = { S_un_b = { s_b1 = '\0' s_b2 = '\0' s_b3 = '\0' s_b4 = '\0' } S_un_w = { s_w1 = 0 s_w2 = 0 } S_addr = 0 } } sin_zero = "" } sin6 = { sin6_family = 0 sin6_port = 0 sin6_flowinfo = 0 sin6_addr = { _S6_un = { _S6_u8 = "" _S6_u32 = (0, 0, 4373928U, 0) __S6_align = 0 } } sin6_scope_id = 26588968U __sin6_src_id = 0 } sas = { ss_family = 0 _ss_pad1 = "" _ss_align = 0.0 _ss_pad2 = "xxB" } } }
As you wrote Solaris on Sun I suppose you mean on SPARC. Have you checked if the crashes happen with the same Solaris version on x86? Background of the question: ap_queue_info_wait_for_idler uses atomics whose implementation depends on the hardware architecture. Non functional atomics could be a source for concurrency problems under load.
(In reply to comment #10) > As you wrote Solaris on Sun I suppose you mean on SPARC. Have you checked if the > crashes happen with the same Solaris version on x86? Background of the question: Oops my fault: You already said that you are using x86. Nevertheless does the same happen on SPARC with the same Solaris version or if you compile with --enable-nonportable-atomics=no ?
Thanks Ruediger for your pointer. It was really useful. Regarding the function : ap_queue_info_wait_for_idler (lines 188-196) 188: struct recycled_pool *first_pool = queue_info->recycled_pools; 189: if (first_pool == NULL) { 190: break; 191: } 192: if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools), first_pool->next, 193: first_pool) == first_pool) { 194: *recycled_pool = first_pool->pool; 195: break; 196: } I will represent queue_info->receycled_pools as qu->rp to make it make it short. Inside apr_atomic_casptr we acquire a mutex. So I will write 3 steps : 1. Calcualte first_pool. 2. Calculate first_pool->next and invoke apr_atomic_caspptr 3. Acquire lock on qi->rp (inside apr_atomic_casptr) There is a very clear race condition between the two statement (line 188 and line 193) and between step 2 & 3. Though I agree that &queue_info->recycled_pool is protected and it is atomically correct but the next pointer (first_pool and first_pool->next) are not protected correctly. There is a very clear race condition between the two. To prove my point, here is an example : Suppose at a particular moment recycled_pool pool list is 1 --> 2 ---> 3 . Where 1,2,3 are the pool nodes. qi->rp = 1. Now consider the following situation : Thread 1 : first_pool = 1; first_pool->next = 2. Now before step 3 is executed that is before we acquire a lock on qi->rp, context switch happens. Thread 2 : Thread 2 pops a node (1) from the list and hence list becomes 2->3. Thread 3 : Thread 3 pops another node (2) from the list and hence list becomes 3. Thread 2 : push the node back and now list becomes 1->3. Thread 1: first->pool->next = 2. qi->rp is still 1. Thread acquires a lock &1 and atomically compare and swap with 2. It succeeded because qi->rp was 1 but qi->rp->next was not 3, it becomes 2 and hence queue becomes 2 (or 2-->3). I believe, I can prove my point with a sample standalone application. So far I used a separate mutex and protected both qi->rp and qi->rp->next both. I tried with the attached patch. With this patch, I am able to run the stress for more than 10 hour without any crash. Without this patch, crash used to happen in less than 30 minutes. Here is the patch which I tried : --------------------------------------------------------------------------- --- orghttpd-2.2.6/server/mpm/worker/fdqueue.c Wed Jul 25 06:13:49 2007 +++ httpd-2.2.6/server/mpm/worker/fdqueue.c Fri Feb 15 10:57:42 2008 @@ -25,6 +25,7 @@ struct fd_queue_info_t { apr_uint32_t idlers; apr_thread_mutex_t *idlers_mutex; + apr_thread_mutex_t *queue_mutex; apr_thread_cond_t *wait_for_idler; int terminated; int max_idlers; @@ -36,6 +37,7 @@ fd_queue_info_t *qi = data_; apr_thread_cond_destroy(qi->wait_for_idler); apr_thread_mutex_destroy(qi->idlers_mutex); + apr_thread_mutex_destroy(qi->queue_mutex); /* Clean up any pools in the recycled list */ for (;;) { @@ -65,6 +67,11 @@ if (rv != APR_SUCCESS) { return rv; } + rv = apr_thread_mutex_create(&qi->queue_mutex, APR_THREAD_MUTEX_DEFAULT, + pool); + if (rv != APR_SUCCESS) { + return rv; + } rv = apr_thread_cond_create(&qi->wait_for_idler, pool); if (rv != APR_SUCCESS) { return rv; @@ -93,14 +100,14 @@ new_recycle = (struct recycled_pool *)apr_palloc(pool_to_recycle, sizeof(*new_recycle)); new_recycle->pool = pool_to_recycle; - for (;;) { - new_recycle->next = queue_info->recycled_pools; - if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools), - new_recycle, new_recycle->next) == - new_recycle->next) { - break; - } - } + rv = apr_thread_mutex_lock(queue_info->queue_mutex); + if (rv != APR_SUCCESS) + return rv; + new_recycle->next = queue_info->recycled_pools; + queue_info->recycled_pools = new_recycle; + rv = apr_thread_mutex_unlock(queue_info->queue_mutex); + if (rv != APR_SUCCESS) + return rv; } /* Atomically increment the count of idle workers */ @@ -182,19 +189,18 @@ /* Atomically decrement the idle worker count */ apr_atomic_dec32(&(queue_info->idlers)); - - /* Atomically pop a pool from the recycled list */ - for (;;) { + rv = apr_thread_mutex_lock(queue_info->queue_mutex); + if (rv != APR_SUCCESS) + return rv; + if (queue_info->recycled_pools) { struct recycled_pool *first_pool = queue_info->recycled_pools; - if (first_pool == NULL) { - break; - } - if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools), first_pool->next, - first_pool) == first_pool) { - *recycled_pool = first_pool->pool; - break; - } + queue_info->recycled_pools = first_pool->next; + *recycled_pool = first_pool->pool; + first_pool->next = NULL; } + rv = apr_thread_mutex_unlock(queue_info->queue_mutex); + if (rv != APR_SUCCESS) + return rv; if (queue_info->terminated) { return APR_EOF; --------------------------------------------------------------------------- If you agree that there is a clear race condition then to correct this, I have following suggestion : (a) Use a dedicated pool for each worker thread, this will avoid any locking. It will perform better but may require little more memory in those situations when worker threads are not fully used. (b) Use some other technique other than a recycled pool list which avoids race conditions. I am in favour of option (a) until some good idea for (b) comes to my mind. If you agree with (a) then I can work and generate a patch. Note : Also I believe that the crash will happen in linux too. I never ran more than 1 hour in linux. I will try that tonight.
(In reply to comment #12) > If you agree that there is a clear race condition then to correct this, I have > following suggestion : > (a) Use a dedicated pool for each worker thread, this will avoid any locking. > It will perform better but may require little more memory in those situations > when worker threads are not fully used. > (b) Use some other technique other than a recycled pool list which avoids race > conditions. > > I am in favour of option (a) until some good idea for (b) comes to my mind. > If you agree with (a) then I can work and generate a patch. > > Note : Also I believe that the crash will happen in linux too. I never ran more > than 1 hour in linux. I will try that tonight. > Thank you for your thorough investigation. I agree with you that we have the described race conditions here. We have a similar race in the event MPM. Next steps: 1. Bring your patch above into trunk. Currently I see no significant performance loss over the current code as we are using a mutex there as well. We only increase the time during which we lock the resource. I don't know right now when I find the cycles to apply the patch to trunk, but if you could attach a trunk version of your patch to this report it would be a big help. 2. Move the further discussion regarding options a) or b) to dev@httpd.apache.org and lets wait for its results to decide how to move along and improve the situation here in the long run.
I think I have to correct myself in two points. 1. On APR trunk there are better implementations for apr_atomic_casptr which no longer use a mutex, but native platform processor / OS features. So in contrast to my first assumption there could be a performance degradation by your patch on trunk, which would be bad. 2. The race scenario you described cannot happen in this way, because it assumes that multiple threads pop pools from the list in parallel. This is not the case as only the listener thread does this. What happens in parallel are: - Multiple pushes to the list - (Multiple) pushes to the list and a pop OTOH I still believe that there is some kind of race scenario as your patch showed that the error goes away if the locking / syncing is changed here. So maybe its only a different scenario (that I haven't figured out so far) or there is a bug in apr_atomic_casptr. Do the same crashes happen with trunk?
Regarding the example given in comments # 12, I need to correct myself. I agree with you that the example is not valid for worker implementation because there is single thread which pop the nodes and multiple threads which pushes the node. ( ap_queue_info_wait_for_idler is not thread safe but it is not called by multiple threads. It is only invoked by single listener_thread. ) I could not yet think of any race condition in which single popping thread and several pushing thread cause recycle_pool list corruption. I am still working on it to find the real cause of the crashes.
Few more updates : * Probably these crashes also exist on Linux (64 bit). But I can't say for sure. I saw 3 crashes so far. Out of 3, I get core dump only once and stack trace from that core dump didn't seem much sense to me so I can't say for sure that the bug reproduces on Linux or not. (Linux is 64 bit Fedora 8 with 64 bit apache). On Solaris, I tried the following things : * Replaced apr_atomic_casptr with solaris's atomic_casptr. But the result remained the same. I still saw the crashes. This means that this may not be the apr bug. * If I replace apr_atomic_casptr code but keep the for loop then the crashes disappear. ---------------------------------- ap_queue_info_set_idle------------- if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools), new_recycle, new_recycle->next) == new_recycle->next) { break; } ---------------------------------- replace with ----------------------- rv = apr_thread_mutex_lock(queue_info->queue_mutex); if (queue_info->recycled_pools == new_recycle->next) { queue_info->recycled_pools = new_recycle; success = 1; } rv = apr_thread_mutex_unlock(queue_info->queue_mutex); ---------------------------------- ap_queue_info_wait_for_idler -------------- if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools), first_pool->next, first_pool) == first_pool) { *recycled_pool = first_pool->pool; break; } ---------------------------------- replace with --------------------------- rv = apr_thread_mutex_lock(queue_info->queue_mutex); if (queue_info->recycled_pools == first_pool) { queue_info->recycled_pools = next; success = 1; } rv = apr_thread_mutex_unlock(queue_info->queue_mutex); ----------------------------------
Created attachment 21581 [details] Patch for httpd-2.2.8 Eventually I figured out where is the real race condition. Lines fdqueue.c:96-102 (in httpd-2.2.8) for (;;) { new_recycle->next = queue_info->recycled_pools; if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools), new_recycle, new_recycle->next) == new_recycle->next) { break; } } The race condition is between return of apr_atomic_casptr and calculating new_recycle->next. Let us write the look into three steps (qi->rp is queue_info->recycled_pools): 1. Set new_recycle->next to qi->rp 2. atomically compare and swap qi->rp with new_recycle if matches with new_recycle->next. 3. Calculate new_recycle->next again. 4. Determine the call apr_atomic_casptr is successful based on the return value and result of step 3. The race condition is in between step2 and step3. If apr_atomic_casptr was successful (it means it successfully swapped the value) and If there is a context switch between 2 and 3 then new_recycle->next can point to something else and can also be corrupted. The result of which is that if condition will fail. I saved the new_recycle->next in a local variable and then used the local variable as shown in the patch and the issue got resolved. Here is the example how new_recycle->next can be changed by a race condition : Suppose our list is 1-->2-->3, where 1,2,3 are list nodes. Now suppose worker thread 1 wants to add a node 4 to it's head. Here is how it goes : ----------------------------------------------------------- Worker thread 1 : new_recycle = 4; qi->rp = 1; new_recycle->next = 1; apr_atomic_casptr successfully compare and swap it with qi->rp that means qi->rp = 4; (list now becomes 4-->1-->2-->3) Now context switch happens : Listener_thread : qi->rp is 4 and hence it pops the node 4 and gives it to worker thread 2. The list becomes 1-->2-->3. Listener thread pops another node and give it worker thread 3 and now list becomes 2-->3. Worker thread 2 : Returns the node 4 into the list and list becomes 4-->2-->3. Worker thread 1 : new_recycle->next now becomes 2 and it compares with 1 and hence comparision fails. ----------------------------------------------------------- Real situations can be little different than what I described because before worker thread returns node 4 to the list, pool is cleared (line 897 of worker.c, in worker_thread function ) apr_pool_clear(ptrans); last_ptrans = ptrans; which means new_recycle->next will be corrupted and point to a deleted value. How I figured out this is that if you put a assert statement like : struct recycled_pool *next = queue_info->recycled_pools; new_recycle->next = next; if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools), new_recycle, new_recycle->next) == next) { ap_assert(next == new_recycle->next); break; } then assertion fails under stress situations. The bug also exist in event mpm too (server/mpm/experimental/event/fdqueue.c). Ruediger, can you review the patch? The patch is against 2.2.8. Should I submit patch against trunk?
Since the crash happens on linux too so I am changing the summary.
Created attachment 21582 [details] Revised patch Made small correction in comments of patch.
Fixed in trunk in r630335.
Great work Basant. Nick beat me to committing your patch, but in the meantime I applied your patch to the event MPM on trunk as well (r630348).
*** Bug 44474 has been marked as a duplicate of this bug. ***
Backported to 2.2.x as r631362 (http://svn.apache.org/viewvc?rev=631362&view=rev).
*** Bug 42086 has been marked as a duplicate of this bug. ***
*** Bug 41030 has been marked as a duplicate of this bug. ***