My httpd threads are all exiting with: *** glibc detected *** double free or corruption (!prev): 0xb8da2940 *** on shutdown. And due to the MALLOC_CHECK this causes some issues. I've isolated the problem to mod_dbd, more specifically to the apr_pool_destroy(conn->pool) called in dbd_close during the shutdown cleanup. If I comment out that destory, things are fine. This didn't happen an older release of mod_dbd, but I've noticed there was some reworking on the pools in this version. I'll attach my gdb session were you can clearly see the problem, couldn't figure out why this is happening though as I'm no debugging or apr wiz. If it matters I've tried apr/apr-util 1.2.2 and 1.2.7 with the same results.
Created attachment 18565 [details] gdb session attaching to a running thread and sending it a SIGTERM
ISTR discussing this with someone recently. Was that you? Does this happen during server operation, or is this at server shutdown? And what backend and dbd-user modules are you using? Any change with 2.2.3?
Don't think it was me, I've never discussed this outside of this ticket. - The problem happens during shutdown. - I'm using a current copy of the apr_dbd_mysql.c backend compiled against MySQL 4.1.20 - I'm using a copy of mod_vhost_dbi modified to work with dbd instead. But the problem happens regardless of what modules are loaded. I can run a single thread with the bare minimum (no 3rd party modules or my own) and reproduce the problem. - Just retested on 2.2.3 and got the same result.
OK, there's clearly a genuine bug here, though I'm struggling to see it in the mod_dbd code (which leads me to wonder about either the pools or - more likely - the mysql driver). Bug me if nothing happens on this. Has anyone seen this bug with derivers other than MySQL?
I tried replying by email to try and sort this, but maybe your address here is bogus. ===================== Do you mind my emailing you about this? (1) You are linking to the thread-safe libmysqlclient_r, aren't you? (2) Can you get a traceback from this, and see if it's coming from within function thread_end and/or mysql_thread_end? (3) If you can't do that, please try commenting out line 790 in apr_dbd_mysql.c, and tell me if that makes any difference. Line 790 is an apr_pool_cleanup_register in function dbd_mysql_init. You'll see my comment about being uncertain there! Hope we can fix this, but I don't get the problem, so I need you to test theories about the likely cause. ==========================
I can verify this bug (httpd-2.2.3). It appears that pool that hosts apr_dbd_t struct (the one passed to apr_dbd_open) gets destroyed before the apr_dbd_close is run. Easy to verify with "httpd -X" and gdb by setting watch value to look at apr_dbd_t.
I also think that this bug does not turns up more often because of the fact that junk value in apr_dbd_t->conn is most often NULL, so it gets silently eaten by the mysql_close (for example). This even happens with pool debug enabled (and pools are supposed to be poisoned); httpd -X is a safe way to trigger a real fault.
It appears that this problem is caused by pool destruction on line 316 in mod_dbd.c: --- apr_pool_destroy(conn->pool); --- The pool in question may be (as it happens on my machine) already destroyed by its parent (I'm using mpm_prefork - have to try yet with mpm_worker). Commenting this line out seems to solve the problem and database handle is still closed (or at least it appears that way).
Yes, I'd spotted that as well and might have a patch tomorrow that addresses the issue -- ping me if I'm slow!
Created attachment 19226 [details] Memory pool fixes for trunk. These patches aim to prevent the following situation, which I suspect was the cause of these bug reports. (I myself have rarely seen the problem, so if someone who can replicate it reliably could try either of these patches, that would be very helpful!) When using a reslist, the reslist internally registers a cleanup function on the memory pool used for the reslist, specifically the reslist_cleanup() function. This function calls the destructor on each resource. For mod_dbd, that destructor is dbd_destruct(), which just calls dbd_close(). Now dbd_close() calls apr_pool_destroy() on the memory sub-pool that was created in dbd_construct() for use by that particular DB connection; this is where the connection's prepared statements are created. Normally, we want this memory pool to be destroyed when dbd_destruct() or dbd_close() is called -- suppose the reslist is expiring a resource, i.e., a DB connection, we want it to (a) close the DB connection and (b) reclaim the memory used for that connection's prepared statements. OK, but when the parent memory pool (the one used by the reslist itself) is destroyed, apr_pool_destroy() first destroys all the sub-pools of that pool -- in this case, these are the per-connection sub-pools. Then it calls all the cleanup functions on the pool, including the reslist cleanup function, which then calls dbd_destruct() on each connection, causes apr_pool_destroy() to be called twice for each sub-pool and presumably sometimes producing the segfaults people have reported. Let me know if these patches fix the problem (and don't introduce new problems!) and I'll commit the trunk version, and propose the fix for 2.2.x. Review of the no-threads logic would be especially helpful. Thanks!
Created attachment 19227 [details] slightly changed for http_request.h include fix I should note that these patches also fix several things I discovered while testing the no-threads persist and no-persist cases. The ap_dbd_[c]acquire() no-threads functions used the wrong value in their apr_pool_cleanup_register() calls, leading to segfaults when the cleanups ran. Also, I noticed that in the no-threads persist case (using the worker MPM) that DB connections were never closed -- not even on shutdown. This turns out to be because apr_pool_destroy() isn't called on s->process->pool when a child process exits; the worker MPM, at least, only performs apr_pool_destroy() on its pchild pool. Most other MPMs seem similar. (Maybe beos MPM doesn't call child_init hooks at all? Yikes.) Anyway, if we stash the per-process lifetime pool at child_init time, that does get destroyed by each child process and it also clarifies the code a little.
Created attachment 19228 [details] same thing for 2.2.x branch
I think I've got a little confused by mod_dbd - in fact I think that MPMs are ok. The probles, as I understand it now is strictly in mod_dbd. It has two different ap_dbd_open functions: one for APR_HAS_THREADS and one for the other case. The no-threads ap_dbd_open requires the particular pool to be explicitly destroyed (it doesn't uses reslist), so dbd_close has an apr_pool_destroy. However the threads version uses reslist and calls dbd_close as a resource destructor, bombing itself (I haven't noticed this yesterday). So, probably the simplistic patch for mod_dbd should do it.
Created attachment 19229 [details] Conditionally comment out the explicit pool destructor
Please take a look at the patches I attached today, and read over the comments I put in the patched code, where I tried to explain what I think should be happening. The reason dbd_close() had an apr_pool_destroy() is not just because of the no-threads cases, it's also because the reslist may decide to call dbd_destruct() on excess or invalid resources at any time, and when it does, you want that to clean up the per-connection pool (the one used for prepared statements for that connection). However, I suspect your crashes are coming about because of the particular sequence of events on shutdown, when the apr_pool_destory() called in the MPM on the top-level pool effectively calls apr_pool_destory() on the per-connection sub-pools first -- then asks the reslist to clean itself up, at which point it calls dbd_destruct(), which does the second bad apr_pool_destroy(). But in general, dbd_destruct() does need to do that apr_pool_destroy(), so we need some signalling around that. Please try with the patches I provided and see if they deal with your particular issue; if not, let me know and we can probe further. Thanks!
Yes, I checked your patch (19228) and it appears to work. I can't check it now with mpm_worker, but I'll do it some time later.
Chris, please don't change the bug assignee field! It's the only way the bugs@ mailing list gets CCed on changes.
Oops, sorry, didn't know! Thanks for catching that.
Created attachment 19238 [details] catch some error condition leaks in dbd_construct
The patches attached so far don't completely solve all the problems identified in the report. For the latest, see the first three patches (1tidy, 2misc, and 3pools) for httpd trunk in: http://people.apache.org/~chrisd/patches/mod_dbd_pools_groups/ These should be applied sequentially. The third (3pools) patch is the one that actually addresses the original issue from this report. It makes use of the fact that MPMs (should) call apr_pool_destroy() on the pool they pass to the child_init hook functions, when each child process exits. However, they don't call apr_pool_destroy() on s->process->pool in child processes. This allows us to work around the interaction of APR reslists and memory pools at pool destruction time.
The issues raised in this bug report should, I hope, be fixed in this revision of mod_dbd in httpd trunk: http://svn.apache.org/viewvc?view=rev&revision=496831 Note that the APR DBD Oracle driver needs the patch attached in PR #41250 to work with this mod_dbd fix. No other APR DBD drivers use memory sub-pools.
Created attachment 19438 [details] same thing for 2.2.x branch
I hope I got the backport right. It works for me, so far. I didn't test other drivers than mysql, but I tried it with both prefork and worker. My testcase is as described below. The entire conf is: -------------- User wwwrun Group www ErrorLog /var/log/apache2/error_log Listen 81 DBDriver mysql DBDParams "host=localhost, user=theuser, pass=thepass, dbname=thedbname" <Directory /srv/www/htdocs/mysql-test> AuthType Basic AuthName "MySQL Testing" AuthBasicProvider dbd Require valid-user AuthDBDUserPWQuery "select password from user where username=%s" </Directory> --------------- The server is built with the 2.2.4 tarball, and putting http://apache.webthing.com/svn/apache/apr/ apr_dbd_mysql.c into srclib/apr-util. In order to build the mysql driver, I used the following patch: Index: build-outputs.mk =============================================================== ==== --- build-outputs.mk.orig 2006-11-29 12:48:46.000000000 +0100 +++ build-outputs.mk 2006-12-13 17:17:33.413014156 +0100 @@ -45,8 +45,9 @@ dbd/apr_dbd.lo: dbd/apr_dbd.c .make.dirs dbd/apr_dbd_sqlite2.lo: dbd/apr_dbd_sqlite2.c .make.dirs dbd/apr_dbd_sqlite3.lo: dbd/apr_dbd_sqlite3.c .make.dirs dbd/apr_dbd_pgsql.lo: dbd/apr_dbd_pgsql.c .make.dirs +dbd/apr_dbd_mysql.lo: dbd/apr_dbd_mysql.c .make.dirs include/apu_version.h -OBJECTS_all = buckets/apr_buckets_pipe.lo buckets/apr_buckets_flush.lo buckets/ apr_buckets_alloc.lo buckets/apr_buckets_pool.lo buckets/apr_buckets_socket.lo buckets/ apr_buckets_heap.lo buckets/apr_buckets_simple.lo buckets/apr_buckets_file.lo buckets/ apr_buckets.lo buckets/apr_buckets_mmap.lo buckets/apr_buckets_eos.lo buckets/apr_brigade.lo buckets/apr_buckets_refcount.lo crypto/apr_sha1.lo crypto/uuid.lo crypto/getuuid.lo crypto/ apr_md5.lo crypto/apr_md4.lo dbm/apr_dbm.lo dbm/apr_dbm_berkeleydb.lo dbm/apr_dbm_gdbm.lo dbm/apr_dbm_ndbm.lo dbm/apr_dbm_sdbm.lo dbm/sdbm/sdbm_pair.lo dbm/sdbm/sdbm.lo dbm/ sdbm/sdbm_hash.lo dbm/sdbm/sdbm_lock.lo encoding/apr_base64.lo hooks/apr_hooks.lo ldap/ apr_ldap_url.lo ldap/apr_ldap_option.lo ldap/apr_ldap_init.lo misc/apr_reslist.lo misc/apu_version.lo misc/apr_date.lo misc/apr_rmm.lo misc/apr_queue.lo uri/apr_uri.lo xml/apr_xml.lo strmatch/ apr_strmatch.lo xlate/xlate.lo dbd/apr_dbd.lo dbd/apr_dbd_sqlite2.lo dbd/apr_dbd_sqlite3.lo dbd/ apr_dbd_pgsql.lo +OBJECTS_all = buckets/apr_buckets_pipe.lo buckets/apr_buckets_flush.lo buckets/ apr_buckets_alloc.lo buckets/apr_buckets_pool.lo buckets/apr_buckets_socket.lo buckets/ apr_buckets_heap.lo buckets/apr_buckets_simple.lo buckets/apr_buckets_file.lo buckets/ apr_buckets.lo buckets/apr_buckets_mmap.lo buckets/apr_buckets_eos.lo buckets/apr_brigade.lo buckets/apr_buckets_refcount.lo crypto/apr_sha1.lo crypto/uuid.lo crypto/getuuid.lo crypto/ apr_md5.lo crypto/apr_md4.lo dbm/apr_dbm.lo dbm/apr_dbm_berkeleydb.lo dbm/apr_dbm_gdbm.lo dbm/apr_dbm_ndbm.lo dbm/apr_dbm_sdbm.lo dbm/sdbm/sdbm_pair.lo dbm/sdbm/sdbm.lo dbm/ sdbm/sdbm_hash.lo dbm/sdbm/sdbm_lock.lo encoding/apr_base64.lo hooks/apr_hooks.lo ldap/ apr_ldap_url.lo ldap/apr_ldap_option.lo ldap/apr_ldap_init.lo misc/apr_reslist.lo misc/apu_version.lo misc/apr_date.lo misc/apr_rmm.lo misc/apr_queue.lo uri/apr_uri.lo xml/apr_xml.lo strmatch/ apr_strmatch.lo xlate/xlate.lo dbd/apr_dbd.lo dbd/apr_dbd_sqlite2.lo dbd/apr_dbd_sqlite3.lo dbd/ apr_dbd_pgsql.lo dbd/apr_dbd_mysql.lo OBJECTS_unix = $(OBJECTS_all) Then I did ./configure --enable-dbd --enable-authn_dbd --with-included-apr && \ make && make install The database account data must be valid. In order to reproduce the bug, I used the commands /usr/local/apache2/bin/httpd kill $(cat /usr/local/apache2/logs/httpd.pid)
Meanwhile, I run the trunk version of mod_dbd.c (r531611) with 2.2, which works fine for me.