Bug 24801 - Apache crashes when distinct users exceeds LDAPCacheEntries
Summary: Apache crashes when distinct users exceeds LDAPCacheEntries
Status: CLOSED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_ldap (show other bugs)
Version: 2.0.47
Hardware: PC All
: P3 critical (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
: 29207 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-11-18 22:36 UTC by Jess Holle
Modified: 2004-11-29 16:32 UTC (History)
1 user (show)



Attachments
Add checking for NULL in *_rmm_* functions (1.75 KB, patch)
2004-05-22 02:06 UTC, Graham Leggett
Details | Diff
Add sanity check so that we don't overflow if purge fails for any reason (3.21 KB, patch)
2004-05-22 02:44 UTC, Graham Leggett
Details | Diff
Fix to util_ald_cache_purge() to relink lists properly (993 bytes, patch)
2004-09-21 16:51 UTC, Jess Holle
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jess Holle 2003-11-18 22:36:21 UTC
The following *is* true with Apache 2.0.47 on Windows.  It *may* well be true 
on other platforms as well -- I've not done sufficient testing to say for 
certain.

Apache crashes when the number of distinct users authenticating against LDAP 
exceeds the setting used for LDAPCacheEntries.  This does not always occur on 
first exceeding this cache size, but in my experience it will invariably occur 
after a few occurences of exceeding the cache size.

A little debugging strongly suggests that there is an issue with the code which 
removes old entries from the cache in this case.

The workaround is either to use a value of 0 for LDAPCacheEntries, i.e. disable 
the cache, or use a value that is larger than your user population plus some 
safety factor.  The safety factor is necessary in that it appears to be 
possible to have more than one entry for a given user in the cache.  This 
appears to occur when one request is using the user entry when another request 
for authenticating the same user comes in.

This issue is masked by bug #24800 and cannot be reached until you work around 
it.
Comment 1 Jess Holle 2004-05-21 17:00:25 UTC
P.S.  I believe this issue might may still be masked by an undersized shared
memory block even though bug #24800 appears to be fixed in 2.0.49.

For instance with:

  LDAPCacheEntries 2150
  # Next line was necessary last I checked as 0 caused issues with active cache
  LDAPOpCacheEntries  1
  LDAPSharedCacheSize 865000
  LDAPSharedCacheFile logs/mod_ldap_cache

I get a child process crash one I get to somewhere between 2151 and 2155
distinct users.

Finally, I'm pretty sure I verified that this issue exists on Solaris and AIX as
well -- but I clearly forgot to note it here.
Comment 2 Graham Leggett 2004-05-21 18:10:29 UTC
Trying to look at this now, although I'm not that familiar with the cache code.
Do you have an example of a stacktrace where the crash is occuring?

I'm trying to work out why the problem would be in cache cleanup rather than in
adding to the cache - maybe it's an edge case somewhere in the cleanup?
Comment 3 Joe Orton 2004-05-21 20:27:33 UTC
It's a long-standing bug that the shared memory caching code does not check for
the apr_rmm_*alloc functions returning NULL, so it will of course die horribly
if the rmm segment fills up and the code tries to allocate more:

return (void *)apr_rmm_addr_get(cache->rmm_addr, apr_rmm_calloc(cache->rmm_addr,
size));
Comment 4 Jess Holle 2004-05-21 20:43:56 UTC
That is a separate bug -- which I believe has been fixed in/by 2.0.49 -- at
least my test case for it no longer failed there.

This bug is about the case where the physical shared memory bytes are sufficient
but the specified logical cache size (i.e. # of entries) is not.

In this case, the cache should simply purge older entries.  Instead it crashes
(attempting to do this).  I've been meaning to generate a stack trace, but have
not managed yet.
Comment 5 Graham Leggett 2004-05-22 02:06:29 UTC
Created attachment 11633 [details]
Add checking for NULL in *_rmm_* functions
Comment 6 Graham Leggett 2004-05-22 02:06:57 UTC
Does this patch make any difference for you?
Comment 7 Graham Leggett 2004-05-22 02:43:20 UTC
In util_ald_cache_insert(), it attempts to add an item to the cache. There is no
check for whether the cache is full, because it is assumed that on the edge case
(of the very last cache entry being allocated) util_ald_cache_purge() will run,
which again is assumed to bring down the cache size.

So in this case, it looks like util_ald_cache_purge() is not bringing down the
cache size, so on the next entry we overflow.

Try this patch and see if it makes a difference - it checks for overflow before
we add, not after. The purge code is probably still broken, but at least we
won't segfault.
Comment 8 Graham Leggett 2004-05-22 02:44:19 UTC
Created attachment 11634 [details]
Add sanity check so that we don't overflow if purge fails for any reason
Comment 9 Graham Leggett 2004-05-23 22:29:34 UTC
Just committed the above patches to the v2.1.0-dev tree, as they stomp on the
segfaults.

The cache problem remains however, if the cache sizes at set to 1, mod_auth_ldap
starts returning auth failures.
Comment 10 Jess Holle 2004-05-24 20:07:26 UTC
I applied the patch provided to 2.0.49 sources (the latest I had readily
available) and get a crash with the following traceback (on Windows).  Note this
was for user 2161 with a cache size of 2150.  Also note that this executable
also includes the latest patches for util_ldap.c [for authenticated LDAP server
access] and mod_auth_ldap.c [for avoiding double-escaping with Microsoft's LDAP
SDK].

util_ldap_dn_compare_node_compare(void * 0x00815b98, void * 0x04d4de80) line 91
+ 12 bytes
util_ald_cache_fetch(util_ald_cache * 0x00d8008c, void * 0x04d4de80) line 351 +
17 bytes
util_ldap_cache_checkuserid(request_rec * 0x6fb51341, util_ldap_connection_t *
0x007dd1e8, const char * 0x0078ced0, const char * 0x007799c8, int 7991832, char
* * 0x00000002, const char * 0x00000000, const char * 0x04d4def0, const char * *
0x007dee59, const char * * * 0x04d4dee4) line 766 + 22 bytes
mod_auth_ldap_check_user_id(request_rec * 0x6ff10e5f) line 334
ap_run_check_user_id(request_rec * 0x007dd1e8) line 69 + 31 bytes
ap_process_request_internal(request_rec * 0x6ff0d6f8) line 193 + 6 bytes
ap_process_request(request_rec * 0x007dd1e8) line 245
ap_process_http_connection(conn_rec * 0x6ff0423f) line 250 + 6 bytes
ap_run_process_connection(conn_rec * 0x007c8ab8) line 42 + 31 bytes
ap_process_connection(conn_rec * 0x007c8ab8, void * 0x007c89e8) line 175 + 6 bytes
worker_main(long 2013300156) line 718
MSVCRT! 780085bc()
KERNEL32! 7c581af6()

Once I let this process die a new child process is created and the test set (of
2500 users) works fine.

For testing this sort of thing, I recommend just exporting a single user (with
password) from LDAP and using this export as a template to programmatically
create many users all the same attributes except for the user name.  You can
then use a simple program, script, or even Ant to attempt to fetch an
authenticated resource on behalf of each user in turn.
Comment 11 Graham Leggett 2004-05-25 17:54:57 UTC
Patches to fix segfaults in the cache code were applied to v2.1.0-dev and
v2.0.50-dev. Testing this by reducing the cache sizes to a size of 1 show that
the segfaults are gone, but the mod_auth_ldap module is returning an auth fail
when it shouldn't, and the cache gets full and stays full.

I have created a new bug report for this: 29207.


*** This bug has been marked as a duplicate of 29207 ***
Comment 12 Jess Holle 2004-07-09 16:14:33 UTC
> Note that the last time I tested the cache entry overflow it still
> crashed when I through 2500 unique user login attempts at a 2150
> entry cache.  This is more representative of our real use cases
> than 5 unique users against a single user entry cache or the like
> and I've not had a chance to (or much interest in) testing this
> particular case.

I've built an Apache 2.0.50 from sources for Windows (to get HTTPS support, of
course, plus tiny extensions to mod_deflate and sockopt -- which is missing
send-buffer-size configurability on Windows) and re-ran the test noted above.

I get a 100% repeatable crash at around user 2160, i.e. the buffer overflow is
*not* fixed, at least not on Windows.  [I can test Solaris and AIX when I get
those binaries built.]

In short, this bug is *not* fixed in 2.0.50.
Comment 13 Jess Holle 2004-09-21 16:51:07 UTC
Created attachment 12817 [details]
Fix to util_ald_cache_purge() to relink lists properly
Comment 14 Jess Holle 2004-09-21 16:54:32 UTC
As per the last comment, I have found the problem behind this bug:
util_ald_cache_purge() simply never relinked the linked list entries during
cache purge.  Instead it freed various elements in the linked list without
updating any linked list pointers, thus begging for trouble as the memory is
reused, etc...

Also, I know this has been resolved as "duplicate", but the fix I have found
proves that the problem was not limited to "duplicate"' bug 29207.  I am thus
reopening this until someone commits my patch.
Comment 15 Brad Nicholes 2004-09-29 17:52:40 UTC
The final patch for this bug that fixes the util_ald_cache_purge()relink 
problem has been backported and posted.  See 
dist/httpd/patches/apply_to_2.0.52.
Comment 16 Graham Leggett 2004-10-03 16:17:46 UTC
*** Bug 29207 has been marked as a duplicate of this bug. ***