Bug 39985 - mod_dbd shutdown double free via dbd_close
Summary: mod_dbd shutdown double free via dbd_close
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_dbd (show other bugs)
Version: 2.2.2
Hardware: PC Linux
: P2 regression (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-06 23:42 UTC by Matthew Kent
Modified: 2007-08-27 10:21 UTC (History)
3 users (show)



Attachments
gdb session (4.19 KB, text/plain)
2006-07-06 23:44 UTC, Matthew Kent
Details
Memory pool fixes for trunk. (10.95 KB, patch)
2006-12-07 13:56 UTC, Chris Darroch
Details | Diff
slightly changed for http_request.h include fix (10.95 KB, patch)
2006-12-07 14:29 UTC, Chris Darroch
Details | Diff
same thing for 2.2.x branch (10.54 KB, patch)
2006-12-07 14:29 UTC, Chris Darroch
Details | Diff
Conditionally comment out the explicit pool destructor (327 bytes, patch)
2006-12-07 17:36 UTC, Alex Dubov
Details | Diff
catch some error condition leaks in dbd_construct (13.62 KB, patch)
2006-12-09 21:54 UTC, Chris Darroch
Details | Diff
same thing for 2.2.x branch (11.70 KB, patch)
2007-01-20 12:20 UTC, Peter Poeml
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Kent 2006-07-06 23:42:36 UTC
My httpd threads are all exiting with:
*** glibc detected *** double free or corruption (!prev): 0xb8da2940 ***
on shutdown. And due to the MALLOC_CHECK this causes some issues.

I've isolated the problem to mod_dbd, more specifically to the
apr_pool_destroy(conn->pool) called in dbd_close during the shutdown cleanup. If
I comment out that destory, things are fine.

This didn't happen an older release of mod_dbd, but I've noticed there was some
reworking on the pools in this version.

I'll attach my gdb session were you can clearly see the problem, couldn't figure
out why this is happening though as I'm no debugging or apr wiz.

If it matters I've tried apr/apr-util 1.2.2 and 1.2.7 with the same results.
Comment 1 Matthew Kent 2006-07-06 23:44:27 UTC
Created attachment 18565 [details]
gdb session

attaching to a running thread and sending it a SIGTERM
Comment 2 Nick Kew 2006-08-08 21:50:47 UTC
ISTR discussing this with someone recently.  Was that you?

Does this happen during server operation, or is this at server shutdown?
And what backend and dbd-user modules are you using?

Any change with 2.2.3?
Comment 3 Matthew Kent 2006-08-09 18:36:14 UTC
Don't think it was me, I've never discussed this outside of this ticket.

- The problem happens during shutdown.
- I'm using a current copy of the apr_dbd_mysql.c backend compiled against MySQL
4.1.20
- I'm using a copy of mod_vhost_dbi modified to work with dbd instead. But the
problem happens regardless of what modules are loaded. I can run a single thread
with the bare minimum (no 3rd party modules or my own) and reproduce the problem.
- Just retested on 2.2.3 and got the same result.
Comment 4 Nick Kew 2006-08-09 19:38:34 UTC
OK, there's clearly a genuine bug here, though I'm struggling to see it in the 
mod_dbd code (which leads me to wonder about either the pools or - more 
likely - the mysql driver).  Bug me if nothing happens on this.

Has anyone seen this bug with derivers other than MySQL?
Comment 5 Nick Kew 2006-08-17 14:30:02 UTC
I tried replying by email to try and sort this, but maybe your address here is 
bogus.

=====================
Do you mind my emailing you about this?

(1) You are linking to the thread-safe libmysqlclient_r, aren't you?

(2) Can you get a traceback from this, and see if it's coming from within 
function thread_end and/or mysql_thread_end?

(3) If you can't do that, please try commenting out line 790 in 
apr_dbd_mysql.c, and tell me if that makes any difference.  Line 790 is
an apr_pool_cleanup_register in function dbd_mysql_init.  You'll see my
comment about being uncertain there!

Hope we can fix this, but I don't get the problem, so I need you to test
theories about the likely cause.
==========================
Comment 6 Alex Dubov 2006-12-03 22:29:43 UTC
I can verify this bug (httpd-2.2.3). It appears that pool that hosts apr_dbd_t 
struct (the one passed to apr_dbd_open) gets destroyed before the apr_dbd_close 
is run. Easy to verify with "httpd -X" and gdb by setting watch value to look 
at apr_dbd_t. 
 
Comment 7 Alex Dubov 2006-12-03 22:37:04 UTC
I also think that this bug does not turns up more often because of the fact 
that junk value in apr_dbd_t->conn is most often NULL, so it gets silently 
eaten by the mysql_close (for example). This even happens with pool debug 
enabled (and pools are supposed to be poisoned); httpd -X is a safe way to 
trigger a real fault. 
 
Comment 8 Alex Dubov 2006-12-06 22:12:11 UTC
It appears that this problem is caused by pool destruction on line 316 in 
mod_dbd.c: 
--- 
apr_pool_destroy(conn->pool); 
--- 
The pool in question may be (as it happens on my machine) already destroyed by 
its parent (I'm using mpm_prefork - have to try yet with mpm_worker). 
Commenting this line out seems to solve the problem and database handle is 
still closed (or at least it appears that way). 
 
Comment 9 Chris Darroch 2006-12-06 22:34:20 UTC
Yes, I'd spotted that as well and might have a patch tomorrow that
addresses the issue -- ping me if I'm slow!
Comment 10 Chris Darroch 2006-12-07 13:56:11 UTC
Created attachment 19226 [details]
Memory pool fixes for trunk.

These patches aim to prevent the following situation, which I suspect was
the cause of these bug reports.  (I myself have rarely seen the problem,
so if someone who can replicate it reliably could try either of these patches,
that would be very helpful!)

When using a reslist, the reslist internally registers a cleanup function
on the memory pool used for the reslist, specifically the reslist_cleanup()
function.  This function calls the destructor on each resource.  For mod_dbd,
that destructor is dbd_destruct(), which just calls dbd_close().  Now
dbd_close() calls apr_pool_destroy() on the memory sub-pool that was
created in dbd_construct() for use by that particular DB connection; this
is where the connection's prepared statements are created.

Normally, we want this memory pool to be destroyed when dbd_destruct() or
dbd_close() is called -- suppose the reslist is expiring a resource, i.e.,
a DB connection, we want it to (a) close the DB connection and (b) reclaim
the memory used for that connection's prepared statements.

OK, but when the parent memory pool (the one used by the reslist itself)
is destroyed, apr_pool_destroy() first destroys all the sub-pools of that
pool -- in this case, these are the per-connection sub-pools.  Then it calls
all the cleanup functions on the pool, including the reslist cleanup
function, which then calls dbd_destruct() on each connection, causes
apr_pool_destroy() to be called twice for each sub-pool and presumably
sometimes producing the segfaults people have reported.

Let me know if these patches fix the problem (and don't introduce new
problems!) and I'll commit the trunk version, and propose the fix for
2.2.x.	Review of the no-threads logic would be especially helpful.
Thanks!
Comment 11 Chris Darroch 2006-12-07 14:29:26 UTC
Created attachment 19227 [details]
slightly changed for http_request.h include fix

I should note that these patches also fix several things I discovered while
testing the no-threads persist and no-persist cases.  The ap_dbd_[c]acquire()
no-threads functions used the wrong value in their apr_pool_cleanup_register()
calls, leading to segfaults when the cleanups ran.

Also, I noticed that in the no-threads persist case (using the worker MPM)
that DB connections were never closed -- not even on shutdown.	This turns
out to be because apr_pool_destroy() isn't called on s->process->pool
when a child process exits; the worker MPM, at least, only performs
apr_pool_destroy() on its pchild pool.	Most other MPMs seem similar.
(Maybe beos MPM doesn't call child_init hooks at all?  Yikes.)	Anyway,
if we stash the per-process lifetime pool at child_init time, that does
get destroyed by each child process and it also clarifies the code a little.
Comment 12 Chris Darroch 2006-12-07 14:29:44 UTC
Created attachment 19228 [details]
same thing for 2.2.x branch
Comment 13 Alex Dubov 2006-12-07 17:35:56 UTC
I think I've got a little confused by mod_dbd - in fact I think that MPMs are  
ok. The probles, as I understand it now is strictly in mod_dbd. It has two  
different ap_dbd_open functions: one for APR_HAS_THREADS and one for the other  
case. The no-threads ap_dbd_open requires the particular pool to be explicitly  
destroyed (it doesn't uses reslist), so dbd_close has an apr_pool_destroy.  
However the threads version uses reslist and calls dbd_close as a resource  
destructor, bombing itself (I haven't noticed this yesterday). So, probably the 
simplistic patch for mod_dbd should do it.  
Comment 14 Alex Dubov 2006-12-07 17:36:55 UTC
Created attachment 19229 [details]
Conditionally comment out the explicit pool destructor
Comment 15 Chris Darroch 2006-12-07 23:02:10 UTC
Please take a look at the patches I attached today, and read over the comments
I put in the patched code, where I tried to explain what I think should be
happening.  The reason dbd_close() had an apr_pool_destroy() is not just
because of the no-threads cases, it's also because the reslist may decide
to call dbd_destruct() on excess or invalid resources at any time, and when it
does, you want that to clean up the per-connection pool (the one used for
prepared statements for that connection).

However, I suspect your crashes are coming about because of the particular
sequence of events on shutdown, when the apr_pool_destory() called in the
MPM on the top-level pool effectively calls apr_pool_destory() on the
per-connection sub-pools first -- then asks the reslist to clean itself
up, at which point it calls dbd_destruct(), which does the second bad
apr_pool_destroy().  But in general, dbd_destruct() does need to do that
apr_pool_destroy(), so we need some signalling around that.

Please try with the patches I provided and see if they deal with your
particular issue; if not, let me know and we can probe further.  Thanks!
Comment 16 Alex Dubov 2006-12-07 23:28:02 UTC
Yes, I checked your patch (19228) and it appears to work. I can't check it now 
with mpm_worker, but I'll do it some time later. 
Comment 17 Joe Orton 2006-12-08 03:29:52 UTC
Chris, please don't change the bug assignee field! It's the only way the bugs@
mailing list gets CCed on changes.
Comment 18 Chris Darroch 2006-12-08 09:12:27 UTC
Oops, sorry, didn't know!  Thanks for catching that.
Comment 19 Chris Darroch 2006-12-09 21:54:03 UTC
Created attachment 19238 [details]
catch some error condition leaks in dbd_construct
Comment 20 Chris Darroch 2006-12-28 14:57:05 UTC
The patches attached so far don't completely solve all the problems
identified in the report.  For the latest, see the first three patches
(1tidy, 2misc, and 3pools) for httpd trunk in:

http://people.apache.org/~chrisd/patches/mod_dbd_pools_groups/

These should be applied sequentially.  The third (3pools) patch is the one
that actually addresses the original issue from this report.  It makes use
of the fact that MPMs (should) call apr_pool_destroy() on the pool they
pass to the child_init hook functions, when each child process exits.  However,
they don't call apr_pool_destroy() on s->process->pool in child processes.
This allows us to work around the interaction of APR reslists and memory
pools at pool destruction time.
Comment 21 Chris Darroch 2007-01-16 11:44:11 UTC
The issues raised in this bug report should, I hope, be fixed in this
revision of mod_dbd in httpd trunk:

http://svn.apache.org/viewvc?view=rev&revision=496831

Note that the APR DBD Oracle driver needs the patch attached in PR #41250
to work with this mod_dbd fix.  No other APR DBD drivers use memory sub-pools.
Comment 22 Peter Poeml 2007-01-20 12:20:27 UTC
Created attachment 19438 [details]
same thing for 2.2.x branch
Comment 23 Peter Poeml 2007-01-20 12:30:47 UTC
I hope I got the backport right. It works for me, so far. 

I didn't test other drivers than mysql, but I tried it with both prefork and worker.

My testcase is as described below.

The entire conf is:
--------------
User wwwrun
Group www
ErrorLog /var/log/apache2/error_log
Listen 81

DBDriver mysql
DBDParams "host=localhost, user=theuser, pass=thepass, dbname=thedbname"
<Directory /srv/www/htdocs/mysql-test>
        AuthType Basic
        AuthName "MySQL Testing"
        AuthBasicProvider dbd
        Require valid-user
        AuthDBDUserPWQuery "select password from user where username=%s"
</Directory>
---------------

The server is built with the 2.2.4 tarball, and putting http://apache.webthing.com/svn/apache/apr/
apr_dbd_mysql.c into srclib/apr-util. In order to build the mysql driver, I used the following patch:

Index: build-outputs.mk
===============================================================
====
--- build-outputs.mk.orig       2006-11-29 12:48:46.000000000 +0100
+++ build-outputs.mk    2006-12-13 17:17:33.413014156 +0100
@@ -45,8 +45,9 @@ dbd/apr_dbd.lo: dbd/apr_dbd.c .make.dirs
 dbd/apr_dbd_sqlite2.lo: dbd/apr_dbd_sqlite2.c .make.dirs
 dbd/apr_dbd_sqlite3.lo: dbd/apr_dbd_sqlite3.c .make.dirs
 dbd/apr_dbd_pgsql.lo: dbd/apr_dbd_pgsql.c .make.dirs
+dbd/apr_dbd_mysql.lo: dbd/apr_dbd_mysql.c .make.dirs include/apu_version.h
 
-OBJECTS_all = buckets/apr_buckets_pipe.lo buckets/apr_buckets_flush.lo buckets/
apr_buckets_alloc.lo buckets/apr_buckets_pool.lo buckets/apr_buckets_socket.lo buckets/
apr_buckets_heap.lo buckets/apr_buckets_simple.lo buckets/apr_buckets_file.lo buckets/
apr_buckets.lo buckets/apr_buckets_mmap.lo buckets/apr_buckets_eos.lo buckets/apr_brigade.lo 
buckets/apr_buckets_refcount.lo crypto/apr_sha1.lo crypto/uuid.lo crypto/getuuid.lo crypto/
apr_md5.lo crypto/apr_md4.lo dbm/apr_dbm.lo dbm/apr_dbm_berkeleydb.lo dbm/apr_dbm_gdbm.lo 
dbm/apr_dbm_ndbm.lo dbm/apr_dbm_sdbm.lo dbm/sdbm/sdbm_pair.lo dbm/sdbm/sdbm.lo dbm/
sdbm/sdbm_hash.lo dbm/sdbm/sdbm_lock.lo encoding/apr_base64.lo hooks/apr_hooks.lo ldap/
apr_ldap_url.lo ldap/apr_ldap_option.lo ldap/apr_ldap_init.lo misc/apr_reslist.lo misc/apu_version.lo 
misc/apr_date.lo misc/apr_rmm.lo misc/apr_queue.lo uri/apr_uri.lo xml/apr_xml.lo strmatch/
apr_strmatch.lo xlate/xlate.lo dbd/apr_dbd.lo dbd/apr_dbd_sqlite2.lo dbd/apr_dbd_sqlite3.lo dbd/
apr_dbd_pgsql.lo
+OBJECTS_all = buckets/apr_buckets_pipe.lo buckets/apr_buckets_flush.lo buckets/
apr_buckets_alloc.lo buckets/apr_buckets_pool.lo buckets/apr_buckets_socket.lo buckets/
apr_buckets_heap.lo buckets/apr_buckets_simple.lo buckets/apr_buckets_file.lo buckets/
apr_buckets.lo buckets/apr_buckets_mmap.lo buckets/apr_buckets_eos.lo buckets/apr_brigade.lo 
buckets/apr_buckets_refcount.lo crypto/apr_sha1.lo crypto/uuid.lo crypto/getuuid.lo crypto/
apr_md5.lo crypto/apr_md4.lo dbm/apr_dbm.lo dbm/apr_dbm_berkeleydb.lo dbm/apr_dbm_gdbm.lo 
dbm/apr_dbm_ndbm.lo dbm/apr_dbm_sdbm.lo dbm/sdbm/sdbm_pair.lo dbm/sdbm/sdbm.lo dbm/
sdbm/sdbm_hash.lo dbm/sdbm/sdbm_lock.lo encoding/apr_base64.lo hooks/apr_hooks.lo ldap/
apr_ldap_url.lo ldap/apr_ldap_option.lo ldap/apr_ldap_init.lo misc/apr_reslist.lo misc/apu_version.lo 
misc/apr_date.lo misc/apr_rmm.lo misc/apr_queue.lo uri/apr_uri.lo xml/apr_xml.lo strmatch/
apr_strmatch.lo xlate/xlate.lo dbd/apr_dbd.lo dbd/apr_dbd_sqlite2.lo dbd/apr_dbd_sqlite3.lo dbd/
apr_dbd_pgsql.lo dbd/apr_dbd_mysql.lo
 
 OBJECTS_unix = $(OBJECTS_all)
 


Then I did

./configure --enable-dbd --enable-authn_dbd --with-included-apr && \
make && make install

The database account data must be valid.

In order to reproduce the bug, I used the commands
/usr/local/apache2/bin/httpd
kill $(cat /usr/local/apache2/logs/httpd.pid)
Comment 24 Peter Poeml 2007-04-23 14:31:00 UTC
Meanwhile, I run the trunk version of mod_dbd.c (r531611) with 2.2, which works fine for me.