44402 – Worker mpm crashes (SEGV) under stress with static workload

Bug 44402 - Worker mpm crashes (SEGV) under stress with static workload

Summary: Worker mpm crashes (SEGV) under stress with static workload

Status:	RESOLVED FIXED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	mpm_worker (show other bugs)
Version:	2.2.6
Hardware:	Sun Solaris

Importance:	P2 blocker (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:	FixedInTrunk, PatchAvailable

Duplicates (3):	41030 42086 44474 (view as bug list)
Depends on:
Blocks:

Reported:	2008-02-12 16:47 UTC by Basant Kumar Kukreja
Modified:	2010-10-06 19:06 UTC (History)
CC List:	2 users (show)

Attachments
Patch for httpd-2.2.8 (1000 bytes, patch) 2008-02-22 12:28 UTC, Basant Kumar Kukreja	Details \| Diff
Revised patch (996 bytes, patch) 2008-02-22 12:41 UTC, Basant Kumar Kukreja	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Basant Kumar Kukreja 2008-02-12 16:47:39 UTC

I am running specweb99 static content workload with httpd 2.2.6 on Solaris
nevada (snv_79).  I am seeing several crashes. Typically crash do reproduce in
10 minutes. Here are the details :
Apache version : httpd-2.2.6
Simultaneous connection : 1000
Hardware : X4100 Server (4 core 2.8 GHz)
CPU : Only single core is enabled
Architecture : x86_64


httpd.conf contains :
<IfModule worker.c>
ListenBackLog     50000
StartServers         2
ThreadLimit        500
ThreadsPerChild    500
MinSpareThreads    100
MaxSpareThreads    100
ThreadsPerChild    500
MaxClients        1000
MaxRequestsPerChild  0
</IfModule>

Listen 192.168.21.1:80
Listen 192.168.22.1:80

Here is the most common stack trace. 

Configure option :
CFLAGS="-g -mt -m64 -KPIC " ./configure --prefix=<prefix_path> --with-mpm=worker
--enable-modules=all --with-ssl=/usr/sfw --enable-mods-shared=all --enable-cgi
--enable-threads && gmake && gmake install

Crash 1 :
(dbx) where
current thread: t@76
=>[1] allocator_free(allocator = 0x101f870, node = (nil)), line 331 in "apr_pools.c"
  [2] apr_pool_clear(pool = 0x102fb88), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0x1020550, b = 0x101f9e8), line 899 in
"core_filters.c"
  [4] ap_pass_brigade(next = 0x1020550, bb = 0x101f9e8), line 526 in "util_filter.c"
  [5] logio_out_filter(f = 0x10204e0, bb = 0x101f9e8), line 135 in "mod_logio.c"
  [6] ap_pass_brigade(next = 0x10204e0, bb = 0x101f9e8), line 526 in "util_filter.c"
  [7] ap_flush_conn(c = 0x101fd00), line 84 in "connection.c"
  [8] ap_lingering_close(c = 0x101fd00), line 123 in "connection.c"
  [9] process_socket(p = 0x101f968, sock = 0x101f9e8, my_child_num = 1,
my_thread_num = 227, bucket_alloc = 0x1029a88), line 545 in "worker.c"
  [10] worker_thread(thd = 0x5bed38, dummy = 0x6dbac0), line 894 in "worker.c"
  [11] dummy_worker(opaque = 0x5bed38), line 142 in "thread.c"
  [12] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [13] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0

Crash 2 :
(dbx) where
current thread: t@363
=>[1] apr_palloc(pool = 0x21680007952225ff, size = 18446744073323675656U), line
601 in "apr_pools.c"
  [2] apr_sockaddr_ip_get(addr = 0xcda3d0, sockaddr = 0x42d790), line 104 in
"sockaddr.c"
  [3] core_create_conn(ptrans = 0xcda2d8, server = 0x4bf600, csd = 0xcda358, id
= 360, sbh = 0xcda378, alloc = 0xd147e8), line 3895 in "core.c"
  [4] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x45fe03
  [5] process_socket(p = 0xcda2d8, sock = 0xcda358, my_child_num = 0,
my_thread_num = 360, bucket_alloc = 0xd147e8), line 542 in "worker.c"
  [6] worker_thread(thd = 0x7192f8, dummy = 0x7e45a0), line 894 in "worker.c"
  [7] dummy_worker(opaque = 0x7192f8), line 142 in "thread.c"
  [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0

I tried httpd-2.2.8 too but got the similar crash :
Crash 3 (with httpd-2.2.8):
=>[1] apr_palloc(pool = 0x226800079e7a25ff, size = 18446744073323675656U), line
630 in "apr_pools.c"
  [2] apr_sockaddr_ip_get(addr = 0xc57060, sockaddr = 0x42dab8), line 104 in
"sockaddr.c"
  [3] core_create_conn(ptrans = 0xc56f68, server = 0x4c0378, csd = 0xc56fe8, id
= 951, sbh = 0xc57008, alloc = 0xc58f78), line 3895 in "core.c"
  [4] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4604e3
  [5] process_socket(p = 0xc56f68, sock = 0xc56fe8, my_child_num = 1,
my_thread_num = 451, bucket_alloc = 0xc58f78), line 542 in "worker.c"
  [6] worker_thread(thd = 0x870c88, dummy = 0x7e7e30), line 894 in "worker.c"
  [7] dummy_worker(opaque = 0x870c88), line 142 in "thread.c"
  [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0

prefork mpm works just fine.

Comment 1 Basant Kumar Kukreja 2008-02-12 17:14:03 UTC

I tried to debug the crash 
=>[1] allocator_free(allocator = 0x101f870, node = (nil)), line 331 in "apr_pools.c"
  [2] apr_pool_clear(pool = 0x102fb88), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0x1020550, b = 0x101f9e8), line 899 in
"core_filters.c"

In ap_core_output_filter, crash is happening when apr_pool_clear is called for
deferred_write_pool.
            apr_pool_clear(ctx->deferred_write_pool);
On further investigation, I found that for ctx->deferred_write_pool, pool->ref
points to pool->next i.e
pool->ref == &pool->next

Thus in apr_pool_clear :
    if (active->next == active)
        return;

    *active->ref = NULL; // ---> this cause active->next to set to NULL because
                         // active->ref points to active->next
    allocator_free(pool->allocator, active->next);

The situation doesn't arrive on normal single connection situation. This
happens only under stress. Under normal connection active->next == active and
function returns from apr_pool_clear (when called for ctx->deferred_write_pool)

Comment 2 Basant Kumar Kukreja 2008-02-12 19:15:42 UTC

Crashes are happening on 32 bit apache too therefore changing the summary.

Comment 3 Basant Kumar Kukreja 2008-02-12 19:17:58 UTC

Here is the crash from 32 bit apache :
=>[1] allocator_free(allocator = 0x8aae018, node = (nil)), line 331 in "apr_pools.c"
  [2] apr_pool_clear(pool = 0x8b629b8), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0x8aae870, b = 0x8aae0e0), line 899 in
"core_filters.c"
  [4] ap_pass_brigade(next = 0x8aae870, bb = 0x8aae0e0), line 526 in "util_filter.c"
  [5] logio_out_filter(f = 0x8aae830, bb = 0x8aae0e0), line 135 in "mod_logio.c"
  [6] ap_pass_brigade(next = 0x8aae830, bb = 0x8aae0e0), line 526 in "util_filter.c"
  [7] ap_flush_conn(c = 0x8aae390), line 84 in "connection.c"
  [8] ap_lingering_close(c = 0x8aae390), line 123 in "connection.c"
  [9] process_socket(p = 0x8aae0a0, sock = 0x8aae0e0, my_child_num = 1,
my_thread_num = 249, bucket_alloc = 0x8b5c9a0), line 545 in "worker.c"
  [10] worker_thread(thd = 0x81a6788, dummy = 0x831f5a0), line 894 in "worker.c"
  [11] dummy_worker(opaque = 0x81a6788), line 142 in "thread.c"
  [12] _thr_setup(0xf004d200), at 0xfec6f282
  [13] _lwp_start(0xfee7ddb9, 0xfee7f55e, 0xfffffff6, 0x0, 0x1, 0xfeea1984), at
0xfec6f4e0

Comment 4 Basant Kumar Kukreja 2008-02-12 19:43:51 UTC

Here is the debug information from a crash of 32 bit apache :

t@414 (l@414) terminated by signal SEGV (Segmentation Fault)
Current function is apr_sockaddr_ip_get
  104       *addr = apr_palloc(sockaddr->pool, sockaddr->addr_str_len);
(dbx) where
current thread: t@414
=>[1] apr_sockaddr_ip_get(addr = 0x974a3d0, sockaddr = (nil)), line 104 in
"sockaddr.c"
  [2] core_create_conn(ptrans = 0x974a348, server = 0x80d9020, csd = 0x974a388,
id = 411, sbh = 0x974a398, alloc = 0x9788670), line 3895 in "core.c"
  [3] ap_run_create_connection(0x974a348, 0x80d9020, 0x974a388, 0x19b,
0x974a398, 0x9788670), at 0x8090ae8
  [4] process_socket(p = 0x974a348, sock = 0x974a388, my_child_num = 0,
my_thread_num = 411, bucket_alloc = 0x9788670), line 542 in "worker.c"
  [5] worker_thread(thd = 0x83a6ff8, dummy = 0x8125e80), line 894 in "worker.c"
  [6] dummy_worker(opaque = 0x83a6ff8), line 142 in "thread.c"
  [7] _thr_setup(0xf008e200), at 0xfec6f282
  [8] _lwp_start(0x0, 0xfee8410c, 0xe451bef8, 0xe451bef8, 0x8081a83, 0x974a3d0),
at 0xfec6f4e0
(dbx) p sockaddr
sockaddr = (nil)
(dbx) where
current thread: t@414
=>[1] apr_sockaddr_ip_get(addr = 0x974a3d0, sockaddr = (nil)), line 104 in
"sockaddr.c"
  [2] core_create_conn(ptrans = 0x974a348, server = 0x80d9020, csd = 0x974a388,
id = 411, sbh = 0x974a398, alloc = 0x9788670), line 3895 in "core.c"
  [3] ap_run_create_connection(0x974a348, 0x80d9020, 0x974a388, 0x19b,
0x974a398, 0x9788670), at 0x8090ae8
  [4] process_socket(p = 0x974a348, sock = 0x974a388, my_child_num = 0,
my_thread_num = 411, bucket_alloc = 0x9788670), line 542 in "worker.c"
  [5] worker_thread(thd = 0x83a6ff8, dummy = 0x8125e80), line 894 in "worker.c"
  [6] dummy_worker(opaque = 0x83a6ff8), line 142 in "thread.c"
  [7] _thr_setup(0xf008e200), at 0xfec6f282
  [8] _lwp_start(0x0, 0xfee8410c, 0xe451bef8, 0xe451bef8, 0x8081a83, 0x974a3d0),
at 0xfec6f4e0
(dbx) up
Current function is core_create_conn
 3895       apr_sockaddr_ip_get(&c->local_ip, c->local_addr);
(dbx) p *c
*c = {
    pool                  = 0x974a348
    base_server           = (nil)
    vhost_lookup_data     = (nil)
    local_addr            = (nil)
    remote_addr           = (nil)
    remote_ip             = (nil)
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_UNKNOWN
    double_reverse        = 0
    keepalives            = 0
    local_ip              = (nil)
    local_host            = (nil)
    id                    = 0
    conn_config           = 0x974a400
    notes                 = 0x974a6a0
    input_filters         = (nil)
    output_filters        = (nil)
    sbh                   = 0x974a398
    bucket_alloc          = (nil)
    cs                    = (nil)
    data_in_input_filters = 0
}
(dbx) dump
alloc = 0x9788670
rv = 0
ptrans = 0x974a348
server = 0x80d9020
sbh = 0x974a398
c = 0x974a3a0
id = 411
csd = 0x974a388
(dbx) _arch_networkio.h`struct apr_socket_t*)csd                              <
*((struct apr_socket_t *) csd) = {
    pool                    = (nil)
    socketdes               = 158893680
    type                    = -17726080
    protocol                = 134660748
    local_addr              = (nil)
    remote_addr             = 0x19b
    timeout                 = 158638920LL
    local_port_unknown      = 0
    local_interface_unknown = 0
    remote_addr_unknown     = 0
    options                 = 0
    inherit                 = 0
    userdata                = (nil)
}

Please let me know if any other information is need.

Comment 5 Ruediger Pluem 2008-02-13 13:03:06 UTC

First guess for your last crash in comment #4 (all line numbers 2.2.8):

lr->accept_func(&csd, lr, ptrans); (line 742 in worker.c) fails with
rv != APR_SUCCESS, but with a non NULL value for csd.
In contrast to the code in prefork we don't check this situation:

Lines 621 - 631 of prefork.c:

        status = lr->accept_func(&csd, lr, ptrans);

        SAFE_ACCEPT(accept_mutex_off());      /* unlock after "accept" */

        if (status == APR_EGENERAL) {
            /* resource shortage or should-not-occur occured */
            clean_child_exit(1);
        }
        else if (status != APR_SUCCESS) {
            continue;
        }

Maybe we need to do a continue in the worker case as well or we need to do
something like the following:

Index: server/mpm/worker/worker.c
===================================================================
--- server/mpm/worker/worker.c  (Revision 627576)
+++ server/mpm/worker/worker.c  (Arbeitskopie)
@@ -743,6 +743,9 @@
             /* later we trash rv and rely on csd to indicate success/failure */
             AP_DEBUG_ASSERT(rv == APR_SUCCESS || !csd);

+            if (rv != APR_SUCCESS) {
+                csd = NULL;
+            }
             if (rv == APR_EGENERAL) {
                 /* E[NM]FILE, ENOMEM, etc */
                 resource_shortage = 1;

Comment 6 Basant Kumar Kukreja 2008-02-13 22:56:02 UTC

I did the stress test with the patch you suggested. After your patch, I
still got the 1st crash. If it crashed in second stack trace then I will update
the bug.

Here are some more information about 1st crash.
* I am able to reproduce the crash on Solaris 10 update 1 (on a different
  machine) too. It took around 4 hours of stress before I got the crash on
  Solaris 10 while it takes around < 30 minutes to reproduce on Solaris nevada.
  It was crash 1 (allocator_free with node = null) (without your patch).

Here is more information of the crash 1 :
(dbx) where
current thread: t@21
=>[1] allocator_free(allocator = 0x8afe2e0, node = (nil)), line 331 in "apr_pools.c"
  [2] apr_pool_clear(pool = 0xa0d01c0), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0xa0b28c8, b = 0xa0b2a08), line 899 in
"core_filters.c"
  [4] ap_pass_brigade(next = 0xa0b28c8, bb = 0xa0b2a08), line 526 in "util_filter.c"
  [5] logio_out_filter(f = 0xa0b2888, bb = 0xa0b2a08), line 135 in "mod_logio.c"
  [6] ap_pass_brigade(next = 0xa0b2888, bb = 0xa0b2a08), line 526 in "util_filter.c"
  [7] ap_flush_conn(c = 0xa0b23e8), line 84 in "connection.c"
  [8] ap_lingering_close(c = 0xa0b23e8), line 123 in "connection.c"
  [9] process_socket(p = 0x8afe368, sock = 0x8aff660, my_child_num = 1,
my_thread_num = 18, bucket_alloc = 0xa0be178), line 545 in "worker.c"
  [10] worker_thread(thd = 0x81487d8, dummy = 0x8117b30), line 894 in "worker.c"
  [11] dummy_worker(opaque = 0x81487d8), line 142 in "thread.c"
  [12] _thr_setup(0xfe244800), at 0xfeccf92e
  [13] _lwp_start(), at 0xfeccfc10
(dbx) up
Current function is apr_pool_clear
  710       allocator_free(pool->allocator, active->next);
(dbx) p *active
*active = {
    next        = (nil)
    ref         = 0xa0d01a8
    index       = 1U
    free_index  = 0
    first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n"
    endp        = 0xa0d21a8 "^A "
}
(dbx) up
Current function is ap_core_output_filter
  899               apr_pool_clear(ctx->deferred_write_pool);
(dbx) p *ctx
*ctx = {
    b                   = (nil)
    deferred_write_pool = 0xa0d01c0
}
(dbx) p *ctx->deferred_write_pool
*ctx->deferred_write_pool = {
    parent           = 0x8afe368
    child            = (nil)
    sibling          = 0xa0c6198
    ref              = 0x8afe36c
    cleanups         = (nil)
    free_cleanups    = (nil)
    allocator        = 0x8afe2e0
    subprocesses     = (nil)
    abort_fn         = (nil)
    user_data        = (nil)
    tag              = 0x80bfd1c "deferred_write"
    active           = 0xa0d01a8
    self             = 0xa0d01a8
    self_first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n"
}
(dbx) p *c
*c = {
    pool                  = 0x8afe368
    base_server           = 0x80e6bf8
    vhost_lookup_data     = (nil)
    local_addr            = 0x8aff698
    remote_addr           = 0x8aff7c0
    remote_ip             = 0xa0b2850 "192.168.11.1"
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_KEEPALIVE
    double_reverse        = 0
    keepalives            = 1
    local_ip              = 0xa0b2840 "192.168.11.2"
    local_host            = (nil)
    id                    = 518
    conn_config           = 0xa0b2448
    notes                 = 0xa0b26e8
    input_filters         = 0xa0b2870
    output_filters        = 0xa0b2888
    sbh                   = 0xa0b23e0
    bucket_alloc          = 0xa0be178
    cs                    = (nil)
    data_in_input_filters = 0
}

One putting some printfs I figured out the following :

In apr_pool_clear (when invoked for deferred_write_pool)
    ...
    active = pool->active = pool->self;
    active->first_avail = pool->self_first_avail;

    if (active->next == active)
        return;

active->next should typically be s circular link list. What is happenning some
cases is that active->next points to some thing else and active->ref still
points to active->next.  I put a printf of active->next before it is set to
NULL. For a particular crash, here is my debugging session. I found that
active->next
was set to 0x20e8810 before it was set to NULL.

(dbx) up
Current function is apr_pool_clear
  774       allocator_free(pool->allocator, active->next);
(dbx) up
Current function is ap_core_output_filter
  923               apr_pool_clear(ctx->deferred_write_pool);
(dbx) p (struct apr_memnode_t*)0x20e8810 -----> This was active->next before set
to NULL.
(struct apr_memnode_t *) 0x20e8810 = 0x20e8810
(dbx) p *(struct apr_memnode_t*)0x20e8810
*((struct apr_memnode_t *) 0x20e8810) = {
    next        = 0x288c5b0
    ref         = 0x20e8810
    index       = 1U
    free_index  = 0
    first_avail = 0x20e9eb0 "GET /file_set/dir00104/class1_3 HTTP/1.0"
    endp        = 0x20ea810 "^A "
}
(dbx) down
Current function is apr_pool_clear
  774       allocator_free(pool->allocator, active->next);
(dbx) p active
active = 0x20e27e0
(dbx) p *((struct apr_memnode_t*)0x20e8810)->next
*((struct apr_memnode_t *) 0x20e8810)->next = {
    next        = 0x20e07d0
    ref         = 0x20e07d0
    index       = 1U
    free_index  = 0
    first_avail = 0x288d008 ""
    endp        = 0x288e5b0 "^A "
}
(dbx) p active
active = 0x20e27e0
(dbx) p *(((struct apr_memnode_t*)0x20e8810)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next = {
    next        = 0x28905d0
    ref         = 0x288c5b0
    index       = 1U
    free_index  = 0
    first_avail = 0x20e2738 ""
    endp        = 0x20e27d0 "^A "
}
(dbx) p *((((struct apr_memnode_t*)0x20e8810)->next)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next->next = {
    next        = 0x288e5c0
    ref         = 0x28905d0
    index       = 1U
    free_index  = 0
    first_avail = 0x2890668 "\xf8^E\x89^B"
    endp        = 0x28925d0 "^Q^P"
}
(dbx) p *(((((struct apr_memnode_t*)0x20e8810)->next)->next)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next->next->next = {
    next        = (nil)
    ref         = (nil)
    index       = 1U
    free_index  = 0
    first_avail = 0x288e5e8 "`^_"
    endp        = 0x28905c0 "^A "
}

On further debugging, I figured out that typically ap_core_output_filter is
called 4 times for a request. The crash always happen in 4th invocation. It
seems to me that it gets corrupted somewhere after the 3rd invocation (after it
returns from ap_core_output_filter) and before it enters into
ap_core_output_filter 4th time (when ap_lingering_close is in call stack). Also
conn->keepalives was always set to 1.

Comment 7 Basant Kumar Kukreja 2008-02-14 00:03:50 UTC

With Ruediger patch, I still got the crash (#2). Here is the debug information :

t@314 (l@314) terminated by signal SEGV (Segmentation Fault)
Current function is apr_sockaddr_ip_get
  104       *addr = apr_palloc(sockaddr->pool, sockaddr->addr_str_len);
(dbx) where
current thread: t@314
=>[1] apr_sockaddr_ip_get(addr = 0x1ebb4b0, sockaddr = (nil)), line 104 in
"sockaddr.c"
  [2] core_create_conn(ptrans = 0x1ebb3b8, server = 0x4c0200, csd = 0x1ebb728,
id = 311, sbh = 0x1ebb458, alloc = 0x2128b58), line 3895 in "core.c"
  [3] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4602c3
  [4] process_socket(p = 0x1ebb3b8, sock = 0x1ebb728, my_child_num = 0,
my_thread_num = 311, bucket_alloc = 0x2128b58), line 566 in "worker.c"
  [5] worker_thread(thd = 0x7195c8, dummy = 0x6e2310), line 923 in "worker.c"
  [6] dummy_worker(opaque = 0x7195c8), line 142 in "thread.c"
  [7] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [8] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
(dbx) p *addr
*addr = (nil)
(dbx) up
Current function is core_create_conn
 3895       apr_sockaddr_ip_get(&c->local_ip, c->local_addr);
(dbx) p *c
*c = {
    pool                  = 0x1ebb3b8
    base_server           = (nil)
    vhost_lookup_data     = (nil)
    local_addr            = (nil)
    remote_addr           = (nil)
    remote_ip             = (nil)
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_UNKNOWN
    double_reverse        = 0
    keepalives            = 0
    local_ip              = (nil)
    local_host            = (nil)
    id                    = 0
    conn_config           = 0x1ebb508
    notes                 = 0x1ebba48
    input_filters         = (nil)
    output_filters        = (nil)
    sbh                   = 0x1ebb458
    bucket_alloc          = (nil)
    cs                    = (nil)
    data_in_input_filters = 0
}
(dbx) dump
alloc = 0x2128b58
rv = 0
ptrans = 0x1ebb3b8
server = 0x4c0200
sbh = 0x1ebb458
c = 0x1ebb460
id = 311
csd = 0x1ebb728
(dbx) p csd
csd = 0x1ebb728
(dbx) p *(struct apr_socket_t*) csd
*((struct apr_socket_t *) csd) = {
    pool                    = (nil)
    socketdes               = 0
    type                    = 0
    protocol                = 0
    local_addr              = (nil)
    remote_addr             = (nil)
    timeout                 = 0
    local_port_unknown      = 0
    local_interface_unknown = 0
    remote_addr_unknown     = 0
    options                 = 0
    inherit                 = 0
    userdata                = (nil)
}

Comment 8 Ruediger Pluem 2008-02-14 12:25:52 UTC

I assume that the ptrans pool somehow gets corrupted. I guess it is used by two
threads in parallel which could lead to a corruption since pools as such are not
thread safe. So I think a good starting point for further investigations would be 

ap_queue_info_wait_for_idler in mpm/worker/fdqueue.c

or the lines 731 - 740 in worker.c:

            if (ptrans == NULL) {
                /* we can't use a recycled transaction pool this time.
                 * create a new transaction pool */
                apr_allocator_t *allocator;

                apr_allocator_create(&allocator);
                apr_allocator_max_free_set(allocator, ap_max_mem_free);
                apr_pool_create_ex(&ptrans, pconf, NULL, allocator);
                apr_allocator_owner_set(allocator, ptrans);
            }

Comment 9 Basant Kumar Kukreja 2008-02-14 14:15:27 UTC

Thanks Ruediger for your suggestion. I will try to explore based on your
suggestion.

Meanwhile here is the 3rd type of crash (with your patch).

t@13 (l@13) terminated by signal SEGV (Segmentation Fault)
Current function is apr_pool_cleanup_kill
 2045       c = p->cleanups;
(dbx) where
current thread: t@13
=>[1] apr_pool_cleanup_kill(p = 0xa0, data = 0x195b888, cleanup_fn =
0xfffffd7fff223540 = &`libapr-1.so.0.2.11`sockets.c`socket_cleanup(void *sock)),
line 2045 in "apr_pools.c"
  [2] apr_pool_cleanup_run(p = 0xa0, data = 0x195b888, cleanup_fn =
0xfffffd7fff223540 = &`libapr-1.so.0.2.11`sockets.c`socket_cleanup(void *sock)),
line 2088 in "apr_pools.c"
  [3] apr_socket_close(thesocket = 0x195b888), line 149 in "sockets.c"
  [4] ap_lingering_close(c = 0x17407f0), line 135 in "connection.c"
  [5] process_socket(p = 0x1740748, sock = 0x195b888, my_child_num = 1,
my_thread_num = 10, bucket_alloc = 0x195b728), line 569 in "worker.c"
  [6] worker_thread(thd = 0x52bb48, dummy = 0x4f3480), line 951 in "worker.c"
  [7] dummy_worker(opaque = 0x52bb48), line 142 in "thread.c"
  [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
(dbx) up
Current function is apr_pool_cleanup_run
 2088       apr_pool_cleanup_kill(p, data, cleanup_fn);
(dbx) up
Current function is apr_socket_close
  149       return apr_pool_cleanup_run(thesocket->pool, thesocket, socket_cleanup);
(dbx) p *thesocket
*thesocket = {
    pool                    = 0xa0
    socketdes               = 26588968
    type                    = 0
    protocol                = 26588928
    local_addr              = 0x195b7e8
    remote_addr             = 0x1741158
    timeout                 = 24383832
    local_port_unknown      = 4895848
    local_interface_unknown = 0
    remote_addr_unknown     = 0
    options                 = 0
    inherit                 = 0
    userdata                = (nil)
}
(dbx) dump
thesocket = 0x195b888
(dbx) up
Current function is ap_lingering_close
  135           apr_socket_close(csd);
(dbx) dump
timeup = 0
dummybuf = ""
c = 0x17407f0
nbytes = 4294967296U
csd = 0x195b888
(dbx) p *c
*c = {
    pool                  = 0x1740748
    base_server           = 0x4c0300
    vhost_lookup_data     = (nil)
    local_addr            = 0x195b8d8
    remote_addr           = 0x195ba18
    remote_ip             = 0x1740f88 "192.168.22.2"
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_UNKNOWN
    double_reverse        = 0
    keepalives            = 0
    local_ip              = 0x1740f78 "192.168.22.1"
    local_host            = (nil)
    id                    = 510
    conn_config           = 0x1740898
    notes                 = 0x1740dd8
    input_filters         = 0x1740fa8
    output_filters        = 0x1740fd0
    sbh                   = 0x17407e8
    bucket_alloc          = 0x195b728
    cs                    = (nil)
    data_in_input_filters = 0
}
(dbx) _arch_networkio.h`struct apr_socket_t*)csd                              <
*((struct apr_socket_t *) csd) = {
    pool                    = 0xa0
    socketdes               = 26588968
    type                    = 0
    protocol                = 26588928
    local_addr              = 0x195b7e8
    remote_addr             = 0x1741158
    timeout                 = 24383832
    local_port_unknown      = 4895848
    local_interface_unknown = 0
    remote_addr_unknown     = 0
    options                 = 0
    inherit                 = 0
    userdata                = (nil)
}
(dbx) etworkio.h`struct apr_socket_t*)csd->local_addr                         <
dbx: can't find field "local_addr" in "*(csd)"
(dbx) p ((`srclib/apr/include/arch/unix/apr_arch_networkio.h`struct apr_socke >
((struct apr_socket_t *) csd)->local_addr = 0x195b7e8
(dbx) p *((`srclib/apr/include/arch/unix/apr_arch_networkio.h`struct apr_sock >
*((struct apr_socket_t *) csd)->local_addr = {
    pool         = 0xa0
    hostname     = 0x195b728 "H^Gt^A"
    servname     = 0x195b700 ""
    port         = 0
    family       = 0
    salen        = 24383744U
    ipaddr_len   = 0
    addr_str_len = 24383744
    ipaddr_ptr   = 0xfffffd7fff2e8010
    next         = (nil)
    sa           = {
        sin  = {
            sin_family = 0
            sin_port   = 0
            sin_addr   = {
                S_un = {
                    S_un_b = {
                        s_b1 = '\0'
                        s_b2 = '\0'
                        s_b3 = '\0'
                        s_b4 = '\0'
                    }
                    S_un_w = {
                        s_w1 = 0
                        s_w2 = 0
                    }
                    S_addr = 0
                }
            }
            sin_zero   = ""
        }
        sin6 = {
            sin6_family   = 0
            sin6_port     = 0
            sin6_flowinfo = 0
            sin6_addr     = {
                _S6_un = {
                    _S6_u8     = ""
                    _S6_u32    = (0, 0, 4373928U, 0)
                    __S6_align = 0
                }
            }
            sin6_scope_id = 26588968U
            __sin6_src_id = 0
        }
        sas  = {
            ss_family = 0
            _ss_pad1  = ""
            _ss_align = 0.0
            _ss_pad2  = "xxB"
        }
    }
}

Comment 10 Ruediger Pluem 2008-02-14 14:44:20 UTC

As you wrote Solaris on Sun I suppose you mean on SPARC. Have you checked if the
crashes happen with the same Solaris version on x86? Background of the question:
ap_queue_info_wait_for_idler uses atomics whose implementation depends on the
hardware architecture. Non functional atomics could be a source for concurrency
problems under load.

Comment 11 Ruediger Pluem 2008-02-14 14:55:43 UTC

(In reply to comment #10)
> As you wrote Solaris on Sun I suppose you mean on SPARC. Have you checked if the
> crashes happen with the same Solaris version on x86? Background of the question:

Oops my fault: You already said that you are using x86. Nevertheless does the
same happen on SPARC with the same Solaris version or if you compile with 
--enable-nonportable-atomics=no ?

Comment 12 Basant Kumar Kukreja 2008-02-15 11:48:52 UTC

Thanks Ruediger for your pointer. It was really useful.

Regarding the function : ap_queue_info_wait_for_idler (lines 188-196)
188:        struct recycled_pool *first_pool = queue_info->recycled_pools;
189:        if (first_pool == NULL) {
190:            break;
191:        }
192:        if (apr_atomic_casptr((volatile
void**)&(queue_info->recycled_pools), first_pool->next,
193:                              first_pool) == first_pool) {
194:            *recycled_pool = first_pool->pool;
195:            break;
196:        }

I will represent queue_info->receycled_pools as qu->rp to make it make it
short.  Inside apr_atomic_casptr we acquire a mutex. So I will write 3 steps :
1. Calcualte first_pool.
2. Calculate first_pool->next and invoke apr_atomic_caspptr
3. Acquire lock on qi->rp (inside apr_atomic_casptr)

There is a very clear race condition between the two statement (line 188 and
line 193) and between step 2 & 3. Though I agree that
&queue_info->recycled_pool is protected and it is atomically correct but the
next pointer (first_pool and first_pool->next) are not protected correctly.
There is a very clear race condition between the two. To prove my point, here
is an example :

Suppose at a particular moment recycled_pool pool list is 
1 --> 2 ---> 3 . Where 1,2,3 are the pool nodes. qi->rp = 1. Now consider the
following situation :
Thread 1 :
    first_pool = 1;
    first_pool->next = 2.
    
Now before step 3 is executed that is before we acquire a lock on qi->rp,
context switch happens.

Thread 2 :
    Thread 2 pops a node (1) from the list and hence list becomes 2->3.

Thread 3 :
    Thread 3 pops another node (2) from the list and hence list becomes 3.

Thread 2 :
    push the node back and now list becomes 1->3.

Thread 1:
    first->pool->next = 2.  qi->rp is still 1.  Thread acquires a lock &1 and
atomically compare and swap with 2. It succeeded because qi->rp was 1 but
qi->rp->next was not 3, it becomes 2 and hence queue becomes 2 (or 2-->3).
        
I believe, I can prove my point with a sample standalone application. So far I
used a separate mutex and protected both qi->rp and qi->rp->next both.  I tried
with the attached patch. With this patch, I am able to run the stress for more
than 10 hour without any crash. Without this patch, crash used to happen in
less than 30 minutes. Here is the patch which I tried :
---------------------------------------------------------------------------

--- orghttpd-2.2.6/server/mpm/worker/fdqueue.c	Wed Jul 25 06:13:49 2007
+++ httpd-2.2.6/server/mpm/worker/fdqueue.c	Fri Feb 15 10:57:42 2008
@@ -25,6 +25,7 @@
 struct fd_queue_info_t {
     apr_uint32_t idlers;
     apr_thread_mutex_t *idlers_mutex;
+    apr_thread_mutex_t *queue_mutex;
     apr_thread_cond_t *wait_for_idler;
     int terminated;
     int max_idlers;
@@ -36,6 +37,7 @@
     fd_queue_info_t *qi = data_;
     apr_thread_cond_destroy(qi->wait_for_idler);
     apr_thread_mutex_destroy(qi->idlers_mutex);
+    apr_thread_mutex_destroy(qi->queue_mutex);
 
     /* Clean up any pools in the recycled list */
     for (;;) {
@@ -65,6 +67,11 @@
     if (rv != APR_SUCCESS) {
         return rv;
     }
+    rv = apr_thread_mutex_create(&qi->queue_mutex, APR_THREAD_MUTEX_DEFAULT,
+                                 pool);
+    if (rv != APR_SUCCESS) {
+        return rv;
+    }
     rv = apr_thread_cond_create(&qi->wait_for_idler, pool);
     if (rv != APR_SUCCESS) {
         return rv;
@@ -93,14 +100,14 @@
         new_recycle = (struct recycled_pool *)apr_palloc(pool_to_recycle,
                                                          sizeof(*new_recycle));
         new_recycle->pool = pool_to_recycle;
-        for (;;) {
-            new_recycle->next = queue_info->recycled_pools;
-            if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
-                                  new_recycle, new_recycle->next) ==
-                new_recycle->next) {
-                break;
-            }
-        }
+        rv = apr_thread_mutex_lock(queue_info->queue_mutex);
+        if (rv != APR_SUCCESS)
+            return rv;
+        new_recycle->next = queue_info->recycled_pools;
+        queue_info->recycled_pools = new_recycle;
+        rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
+        if (rv != APR_SUCCESS)
+            return rv;
     }
 
     /* Atomically increment the count of idle workers */
@@ -182,19 +189,18 @@
 
     /* Atomically decrement the idle worker count */
     apr_atomic_dec32(&(queue_info->idlers));
-
-    /* Atomically pop a pool from the recycled list */
-    for (;;) {
+    rv = apr_thread_mutex_lock(queue_info->queue_mutex);
+    if (rv != APR_SUCCESS)
+        return rv;
+    if (queue_info->recycled_pools) {
         struct recycled_pool *first_pool = queue_info->recycled_pools;
-        if (first_pool == NULL) {
-            break;
-        }
-        if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
first_pool->next,
-                              first_pool) == first_pool) {
-            *recycled_pool = first_pool->pool;
-            break;
-        }
+        queue_info->recycled_pools = first_pool->next;
+        *recycled_pool = first_pool->pool;
+        first_pool->next = NULL;
     }
+    rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
+    if (rv != APR_SUCCESS)
+        return rv;
 
     if (queue_info->terminated) {
         return APR_EOF;
---------------------------------------------------------------------------

If you agree that there is a clear race condition then to correct this, I have
following suggestion :
(a) Use a dedicated pool for each worker thread, this will avoid any locking.
It will perform better but may require little more memory in those situations
when worker threads are not fully used.
(b) Use some other technique other than a recycled pool list which avoids race
conditions.

I am in favour of option (a) until some good idea for (b) comes to my mind.
If you agree with (a) then I can work and generate a patch.

Note : Also I believe that the crash will happen in linux too. I never ran more
than 1 hour in linux. I will try that tonight.

Comment 13 Ruediger Pluem 2008-02-16 00:25:03 UTC

(In reply to comment #12)

> If you agree that there is a clear race condition then to correct this, I have
> following suggestion :
> (a) Use a dedicated pool for each worker thread, this will avoid any locking.
> It will perform better but may require little more memory in those situations
> when worker threads are not fully used.
> (b) Use some other technique other than a recycled pool list which avoids race
> conditions.
> 
> I am in favour of option (a) until some good idea for (b) comes to my mind.
> If you agree with (a) then I can work and generate a patch.
> 
> Note : Also I believe that the crash will happen in linux too. I never ran more
> than 1 hour in linux. I will try that tonight.
> 

Thank you for your thorough investigation. I agree with you that we have the
described race conditions here. We have a similar race in the event MPM.
Next steps:

1. Bring your patch above into trunk. Currently I see no significant performance 
   loss over the current code as we are using a mutex there as well. We only
   increase the time during which we lock the resource. I don't know right
   now when I find the cycles to apply the patch to trunk, but if you could
   attach a trunk version of your patch to this report it would be a big help.

2. Move the further discussion regarding options a) or b) to 
   dev@httpd.apache.org and lets wait for its results to decide how to move 
   along and improve the situation here in the long run.

Comment 14 Ruediger Pluem 2008-02-16 14:23:39 UTC

I think I have to correct myself in two points.

1. On APR trunk there are better implementations for apr_atomic_casptr which no 
   longer use a mutex, but native platform processor / OS features. So in 
   contrast to my first assumption there could be a performance degradation by
   your patch on trunk, which would be bad.

2. The race scenario you described cannot happen in this way, because it assumes
   that multiple threads pop pools from the list in parallel. This is not the 
   case as only the listener thread does this. What happens in parallel are:

   - Multiple pushes to the list
   - (Multiple) pushes to the list and a pop

OTOH I still believe that there is some kind of race scenario as your patch
showed that the error goes away if the locking / syncing is changed here.
So maybe its only a different scenario (that I haven't figured out so far) or
there is a bug in apr_atomic_casptr.
Do the same crashes happen with trunk?

Comment 15 Basant Kumar Kukreja 2008-02-18 11:33:55 UTC

Regarding the example given in comments # 12, I need to correct myself. I
agree with you that the example is not valid for worker implementation because
there is single thread which pop the nodes and multiple threads which pushes
the node.  ( ap_queue_info_wait_for_idler is not thread safe but it is not
called by multiple threads. It is only invoked by single listener_thread. )

I could not yet think of any race condition in which single popping thread
and several pushing thread cause recycle_pool list corruption.

I am still working on it to find the real cause of the crashes.

Comment 16 Basant Kumar Kukreja 2008-02-21 18:24:53 UTC

Few more updates : 
* Probably these crashes also exist on Linux (64 bit). But I can't say for
sure. I saw 3 crashes so far. Out of 3, I get core dump only once and stack
trace from that core dump didn't seem much sense to me so I can't say for sure
that the bug reproduces on Linux or not. (Linux is 64 bit Fedora 8 with 64 bit
apache). 

On Solaris, I tried the following things :
* Replaced apr_atomic_casptr with solaris's atomic_casptr. But the result
remained the same. I still saw the crashes. This means that this may not
be the apr bug.
* If I replace apr_atomic_casptr code but keep the for loop then  the crashes
disappear.
---------------------------------- ap_queue_info_set_idle-------------
            if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
                                  new_recycle, new_recycle->next) ==
                new_recycle->next) {
                break;
            }
---------------------------------- replace with -----------------------
            rv = apr_thread_mutex_lock(queue_info->queue_mutex);
            if (queue_info->recycled_pools == new_recycle->next) {
                queue_info->recycled_pools = new_recycle;
                success = 1;
            }
            rv = apr_thread_mutex_unlock(queue_info->queue_mutex);


---------------------------------- ap_queue_info_wait_for_idler --------------
        if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
first_pool->next,
                              first_pool) == first_pool) {
            *recycled_pool = first_pool->pool;
            break;
        }
---------------------------------- replace with ---------------------------
        rv = apr_thread_mutex_lock(queue_info->queue_mutex);
        if (queue_info->recycled_pools == first_pool) {
            queue_info->recycled_pools = next;
            success = 1;
        }
        rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
----------------------------------

Comment 17 Basant Kumar Kukreja 2008-02-22 12:28:17 UTC

Created attachment 21581 [details]
Patch for httpd-2.2.8

Eventually I figured out where is the real race condition.

Lines fdqueue.c:96-102 (in httpd-2.2.8)
	for (;;) {
	    new_recycle->next = queue_info->recycled_pools;
	    if (apr_atomic_casptr((volatile
void**)&(queue_info->recycled_pools),
				  new_recycle, new_recycle->next) ==
		new_recycle->next) {
		break;
	    }
	}

The race condition is between return of apr_atomic_casptr and calculating
new_recycle->next.

Let us write the look into three steps (qi->rp is queue_info->recycled_pools):
1. Set new_recycle->next to qi->rp
2. atomically compare and swap qi->rp with new_recycle if matches with
    new_recycle->next.
3. Calculate new_recycle->next again.
4. Determine the call apr_atomic_casptr is successful based on the return value
and
   result of step 3.

The race condition is in between step2 and step3. If apr_atomic_casptr was
successful (it means it successfully swapped the value) and If there is a
context switch between 2 and 3 then new_recycle->next can point to something
else and can also be corrupted. The result of which is that if condition will
fail.

I saved the new_recycle->next in a local variable and then used the local
variable as shown in the patch and the issue got resolved.

Here is the example how new_recycle->next can be changed by a race condition :
Suppose our list is 1-->2-->3, where 1,2,3 are list nodes. Now suppose worker
thread 1 wants to add a node 4 to it's head. Here is how it goes :

-----------------------------------------------------------
Worker thread 1 :
   new_recycle = 4;
   qi->rp = 1;
   new_recycle->next = 1;
   apr_atomic_casptr successfully compare and swap it with qi->rp that means
   qi->rp = 4; 
   (list now becomes 4-->1-->2-->3)

   Now context switch happens :

Listener_thread : 
   qi->rp is 4 and hence it pops the node 4 and gives it to worker thread 2.
   The list becomes 1-->2-->3.

   Listener thread pops another node and give it worker thread 3 and now list
   becomes 2-->3.

Worker thread 2 :
   Returns the node 4 into the list and list becomes 4-->2-->3.

Worker thread 1 :
   new_recycle->next now becomes 2 and it compares with 1 and hence
comparision fails.
-----------------------------------------------------------

   Real situations can be little different than what I described because before

worker thread returns node 4 to the list, pool is cleared (line 897 of
worker.c,
in worker_thread function )
	apr_pool_clear(ptrans);
	last_ptrans = ptrans;

which means new_recycle->next will be corrupted and point to a deleted value.

How I figured out this is that if you put a assert statement like :
	    struct recycled_pool *next = queue_info->recycled_pools;
	    new_recycle->next = next;
	    if (apr_atomic_casptr((volatile
void**)&(queue_info->recycled_pools),
				  new_recycle, new_recycle->next) == next) {
		ap_assert(next == new_recycle->next);
		break;
	    }
then assertion fails under stress situations.

The bug also exist in event mpm too (server/mpm/experimental/event/fdqueue.c).

Ruediger, can you review the patch? The patch is against 2.2.8. Should I submit

patch against trunk?

Comment 18 Basant Kumar Kukreja 2008-02-22 12:34:29 UTC

Since the crash happens on linux too so I am changing the summary.

Comment 19 Basant Kumar Kukreja 2008-02-22 12:41:11 UTC

Created attachment 21582 [details]
Revised patch

Made small correction in comments of patch.

Comment 20 Nick Kew 2008-02-22 14:17:59 UTC

Fixed in trunk in r630335.

Comment 21 Ruediger Pluem 2008-02-22 15:00:31 UTC

Great work Basant. Nick beat me to committing your patch, but in the meantime I
applied your patch to the event MPM on trunk as well (r630348).

Comment 22 Nick Kew 2008-02-22 17:35:12 UTC

*** Bug 44474 has been marked as a duplicate of this bug. ***

Comment 23 Ruediger Pluem 2008-02-26 12:37:54 UTC

Backported to 2.2.x as r631362 (http://svn.apache.org/viewvc?rev=631362&view=rev).

Comment 24 Nick Kew 2008-02-27 15:27:49 UTC

*** Bug 42086 has been marked as a duplicate of this bug. ***

Comment 25 William A. Rowe Jr. 2010-10-06 19:06:04 UTC

*** Bug 41030 has been marked as a duplicate of this bug. ***