39095 – ldap_simple_bind_s fails

Bug 39095 - ldap_simple_bind_s fails

Summary: ldap_simple_bind_s fails

Status:	RESOLVED FIXED

Alias:	None

Product:	Apache httpd-2
Classification:	Unclassified
Component:	mod_ldap (show other bugs)
Version:	2.0.55
Hardware:	PC Windows Server 2003

Importance:	P2 major with 23 votes (vote)
Target Milestone:	---
Assignee:	Apache HTTPD Bugs Mailing List

URL:
Keywords:	PatchAvailable

Duplicates (1):	42498 (view as bug list)
Depends on:
Blocks:

Reported:	2006-03-24 13:20 UTC by Christian Kn
Modified:	2012-11-12 18:07 UTC (History)
CC List:	7 users (show)

Attachments
use macro for LDAP_SERVER_DOWN (4.53 KB, patch) 2007-11-12 14:41 UTC, Eric Covener	Details \| Diff
try 2 (4.53 KB, patch) 2007-11-14 03:57 UTC, Eric Covener	Details \| Diff
Patch for Apache 2.2.6 (tested successfully with Apache 2.2.4 as well) (1.08 KB, patch) 2007-11-15 05:59 UTC, sasha	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Christian Kn 2006-03-24 13:20:47 UTC

There seems to be a problem with pooled connections. Everything works fine but
after a while when no authentications are performed ldap_simple_bind_s fails
with code 0x34 LDAP_UNAVAILABLE. This causes that the user is reprompted for his
password.
Observing the pooled LDAP connection with netstat gives some interesting
information. When everything is working the TCP Socket state to the LDAP Server
is ESTABLISHED. After a while the state changes to CLOSE_WAIT. After this state
change the authentication will fail once.
Maybe this happens since the remote server closes the connection.
I'm working in an ActiveDirectory environment.

Comment 1 Jeff Trawick 2006-03-24 18:19:15 UTC

What LDAP client library is this?  2.0.55 LDAP checks for SERVER_DOWN retcode
and retries.  That's the retcode I've seen with a couple of LDAP client
libraries when the LDAP server has dropped the connection and a subsequent
attempt to use the connection fails.

Comment 2 Christian Kn 2006-03-24 20:24:10 UTC

I compiled the LDAP modules myself with the Microsft Windows Server 2003 SP1
Platform SDK. The regular binaries didn't work at all. I think this issue is
covered in another bug.
Short answer is I'm using the Microsoft LDAP SDK.

Since I really need the mod_auth_ldap I patched the code so it also retries when
the code is LDAP_UNAVAILABLE. This seems to work.

Comment 3 Chris Jensen 2006-06-14 21:52:15 UTC

Experiencing same problem
After about 10+ minutes I'm seeing the following in the error log
[LDAP: ldap_simple_bind_s() failed][Unavailable]

Users see an Internal Server Error page
If they refresh the page, the page will load again.

Looks like the LDAP connection pool becomes invalid and an error is displayed.  
After the error, the next request reconnects to the ldap.

I have tried almost every immaginable LDAPCacheTTL and LDAPOpCacheTTL value 
except for disabling caching altogether.

Comment 4 Brad Nicholes 2006-06-20 15:40:40 UTC

(In reply to comment #3)
> After the error, the next request reconnects to the ldap.
> I have tried almost every immaginable LDAPCacheTTL and LDAPOpCacheTTL value 
> except for disabling caching altogether.

There aren't any LDAPxxx directives that control the connection cache.  The 
connection cache is enabled by default with no way to modify it through 
configuration.  The existing LDAPCachexxx directives only control the various 
credential caches.  Bottomline is that there is no way to disable the LDAP 
connection caching short of adding a new directive to mod_ldap.

The current code implementation is designed to work basically in the manner 
that you have described.  It is designed to choose a connection from the 
cache, attempt to use the connection and if it fails, unbind and mark the 
connection as bad so that the next time it is pulled from the cache, it will 
reconnect.

Comment 5 J.J. 2006-06-23 19:17:51 UTC

So the only work around is to edit the source code?

(In reply to comment #4)
> (In reply to comment #3)
> > After the error, the next request reconnects to the ldap.
> > I have tried almost every immaginable LDAPCacheTTL and LDAPOpCacheTTL value 
> > except for disabling caching altogether.
> 
> There aren't any LDAPxxx directives that control the connection cache.  The 
> connection cache is enabled by default with no way to modify it through 
> configuration.  The existing LDAPCachexxx directives only control the various 
> credential caches.  Bottomline is that there is no way to disable the LDAP 
> connection caching short of adding a new directive to mod_ldap.
> 
> The current code implementation is designed to work basically in the manner 
> that you have described.  It is designed to choose a connection from the 
> cache, attempt to use the connection and if it fails, unbind and mark the 
> connection as bad so that the next time it is pulled from the cache, it will 
> reconnect.
> 
>

Comment 6 Arthur 2007-05-24 05:08:27 UTC

This problem is still present in 2.2 on WinXP. Did anyone have any success in
resolving the problem in any way?

Comment 7 Arthur 2007-05-25 03:51:02 UTC

I've had a look at the network traffic. What happens is that after succesful
LDAP message exchanges the TCP connection is teared down from server side with a
FIN. mod_ldap answers with an ACK and therefore keeps its side of the connection
open. Later on mod_ldap sends a bind request over this half open connection, the
LDAP server responds with a RST and Apache throws a 500.

As half open connections don't make much sense for LDAP I'd say it would be
better if mod_ldap would send a FIN-ACK when receiving a FIN and tear down the
connection completely.

Comment 8 Brad Nicholes 2007-05-30 08:05:15 UTC

Part of the problem is that you are looking at the 2.0.x code base of 
mod_ldap.  There really isn't much working going on in the 2.0.x code base 
since mod_authnz_ldap and mod_ldap have moved on significantly in 2.2.x and 
trunk.  This same issue has already been addressed in 2.2.x (see bug #40878) 
but there are no plans to back port the patch to 2.0.x mainly because the code 
bases are very different between 2.0.x and 2.2.x.  I would suggest giving 
2.2.x a try and see if that resolves your issue.

Comment 9 Arthur 2007-05-30 10:10:55 UTC

Thanks for your prompt reply Brad. This is still present on Apache 2.2 on Win XP
connecting to a Windows 2003 AD server. It gives an [LDAP: ldap_simple_bind_s()
failed][Unavailable] error message (as opposed to [LDAP: ldap_simple_bind_s()
failed][Can't contact LDAP server] in bug 40878). The network traffic analysis
from comment #7 was done with the newest available windows binaries.

Comment 10 Brad Nicholes 2007-05-30 11:58:19 UTC

This goes back to Jeff's comment in comment #2.  When mod_ldap attempts to use 
the connection, is evaluates the error code that is returned.  In that 
evaluation, it looks for SERVER_DOWN and then retries.  We have already seen 
several places where the Windows LDAP server is returning error codes that are 
different from OpenLDAP or Novell LDAP.  This looks like another one.  I don't 
have a Windows LDAP server nor am I running Apache on Windows so I really 
don't know what error code the Windows LDAP server is returning.  But my guess 
is that the error code is something other than what is expected.  More 
research would have to be done at the source code level by somebody that is 
running the code on Windows.

Comment 11 Aki Salminen 2007-09-18 01:39:59 UTC

I did use latest 2.2.6 code and I did add ldap_simple_bind_s() return code with
sprintf( "...%d", rc ) to the ldap_simple_bind_s() failed message.

Return code for this case in my logs was 51. I have limeted undestanding for
sprintf and I was expecting to see decimal number, but 51 is likely 0x51 (
LDAP_SERVER_DOWN ). 

TCP traffic sniffing verifies that LDAP client does try again and that second
try is successful. But for some reason Apache is still sending Internal server
error to a http client. If a http clients makes refresh after "Internal server
error" correct page is served by Apache without any further TCP traffic to LDAP
server.

In the end I agree with Comment #7 

The LDAP retry is successful! But as MS LDAP server has closed idle "half open"
connection after 10min without any notice to LDAP client Apache seems to run
into "Internal server error". LDAP retry is successful, but there is the
"Internal server error" to be shown...

Comment 12 sasha 2007-10-10 05:19:44 UTC

It seems that ist is known what needs to be done :-) (i.e. "it would be
better if mod_ldap would send a FIN-ACK when receiving a FIN and tear down the
connection completely."). Is there any chance that this is finally going to be
implemented - the bug is known since 1.5 years!

Comment 13 Brad Nicholes 2007-10-10 06:26:56 UTC

Mod_ldap doesn't deal with network connections at this level.  It is the ldap 
client library that handles ACK's and FIN's.  Mod_ldap doesn't know that 
anything has happened to the network connection until the next time that it 
tries to call an ldap API.  All it can do is handle the error that is 
returned.  In this case, it is already handling the LDAP_SERVER_DOWN error 
code properly by re-establishing the connection as was noted in comment #11.  
The question remains, where is the internal server error message coming from? 
Somewhere there is a Windows specific error code that is not being handled 
properly.  Without a Windows box, I can't tell where that is.  Somebody with 
the ability to debug through the code on Windows, will need to track it down.

Comment 14 sasha 2007-10-10 06:54:06 UTC

(In reply to comment #13)
Brad, thanks for your prompt respond.

I do fully understand your position (and frankly speaking assume that at the end
<MS AD is to be blamed>). Unfortunately we do not have the possibility to debug
the Windows code. In our environment Apache is the frontend to Subversion and
Active Directory is used "just" to authenticate the users. Whereas an occasional
"Internal Server" error from Apache is not a huge problem for a human user
(after retrying once all works well) this causes SERIOUS problems for automated
taskes, e.g. for Continuous Integration.

I wonder if the solution as proposed in comment #2 is the reasonable (dirty)
workaround for the "poor" MS AD Users ;-) I already asked at Apache Lounge if
somebody would volunteer to do this job for the community.

Comment 15 Andy Wang 2007-11-12 11:15:55 UTC

We've patched util_ldap.c on our build of apache to in uldap_connection_open to
check for LDAP_UNAVAILABLE.
The patch is actually conditional on
#if APR_HAS_MICROSOFT_LDAPSDK

The problem is indeed the Microsoft SDK.  When the TCP RST comes in, instead of
returning back to the calling code an LDAP_SERVER_DOWN return code, it returns
LDAP_UNAVAILABLE, which according to the LDAP RFCs is wrong.  LDAP_UNAVAILABLE
is supposed to be a server return code telling the client, "I'm unavailable,
maybe I'm shutting down, maybe I'm in the middle of maintenance".

The SDK APIs should not be interpreting TCP RSTs as an LDAP_UNAVAILABLE. 
However, good luck getting Microsoft to admit and/or fix this.

If there's enough interest, I can pull out our patch (we have multiple patches
against the file and unfortunately no diff files right now), but I think it's
pretty easy to change uldap_connection_open to do the retry on LDAP_UNAVAILABLE
if the MS LDAP SDK is used.

Comment 16 Arthur 2007-11-12 11:39:26 UTC

Andy, that would be wonderful. There are certainly quite a few people who would
like to see a fix (and be it for a bug in MS' implementation).

Comment 17 Eric Covener 2007-11-12 14:41:13 UTC

Created attachment 21121 [details]
use macro for LDAP_SERVER_DOWN

For trunk we'd put the macro in apr-util 1.3.x, for Apache 2.2.x the macro
would really be in util_ldap.c

I assume this is effectively the patch some of the commenters are using, can
someone give it a try?

2.2.x version of patch here:
http://people.apache.org/~covener/2.2.x-ldap-unavailable.diff

Comment 18 Dave Huang 2007-11-13 22:02:34 UTC

(In reply to comment #17)
> Created an attachment (id=21121) [edit]
> use macro for LDAP_SERVER_DOWN

Doesn't this change reverse the sense of the test?

-        if (LDAP_SERVER_DOWN != rc) {
+        if (APR_LDAP_SERVER_DOWN(rc)) {

It should be
        if (!APR_LDAP_SERVER_DOWN(rc)) {
right?

Comment 19 Eric Covener 2007-11-14 03:57:31 UTC

Created attachment 21124 [details]
try 2

thanks khym@azeotrope.org ,

2.2.x: http://people.apache.org/~covener/2.2.x-ldap-unavailable-2.diff

Comment 20 Andy Wang 2007-11-14 10:05:56 UTC

Eric's patch is pretty much what we do, and we've tested it against active
directory and Sun's LDAP server from Windows Apache servers and it seems to do
the right thing.

Comment 21 sasha 2007-11-15 05:59:27 UTC

Created attachment 21129 [details]
Patch for Apache 2.2.6 (tested successfully with Apache 2.2.4 as well)

This was tested to work 100% OK with Apache on Windows (built with Visual
Studio 2005 - download from Apache Lounge).

For more details & explanations see
http://www.apachelounge.com/forum/viewtopic.php?t=1995

Comment 22 Eric Covener 2007-11-15 06:42:57 UTC

I don't think it's wise to have other SDKs retry when they return LDAP_UNAVAILABLE

Comment 23 Andy Wang 2007-11-15 11:48:21 UTC

I agree with Eric.  LDAP_UNAVAILABLE is specifically to tell the ldap client to
go away and in cases where it's working correctly, the LDAP client should not be
retrying.  I prefer the conditional macro definition in the previous patch.

Comment 24 Eric Covener 2007-11-15 12:26:01 UTC

Can someone with APR karma look at this for APR 1.3.x:

http://issues.apache.org/bugzilla/show_bug.cgi?id=43875

(thanks sasha, missed the check in authnz_ldap)

Comment 25 sasha 2007-11-15 13:35:14 UTC

(In reply to comment #24)
> ...
> (thanks sasha, missed the check in authnz_ldap)
> 
You are welcome Eric. Unfortunately MS ActiveDirectory returns LDAP_UNAVAILABLE
when it should not, so this is the possible (dirty, but simple) workaround ;-)

Comment 26 Tom Donovan 2007-12-01 09:30:14 UTC

re: "LDAP_UNAVAILABLE is specifically to tell the ldap client to go away"

It's not clear that this interpretation is universal, or that the handling of
LDAP_UNAVAILABLE must necessarily differ from LDAP_SERVER_DOWN.

OpenLDAP, for example, will substitute LDAP_UNAVAILABLE for LDAP_SERVER_DOWN. 
See http://www.openldap.org/devel/cvsweb.cgi/~checkout~/servers/slapd/result.c
line 1544 (near the bottom).

The OpenLDAP back-server code and test programs retry on LDAP_UNAVAILABLE for
all platforms. See tests/progs/slapd-search.c:186 and
servers/slapd/back-meta/bind.c:705

Comment 27 William A. Rowe Jr. 2007-12-30 22:49:51 UTC

It would be great for the ldap gurus to wrap their heads around this and reach
a concensus.  Is it time to ping openldap-devel@openldap.org for their official 
position on retries and timing when presented with LDAP_UNAVAILABLE?

Is there a reason this hasn't been at least /conditionally/ committed for win32?

Comment 28 Eric Covener 2007-12-31 09:51:39 UTC

Thanks for revisiting Bill, I've created a separate bug to handle the less
pressing issue of retry on LDAP_UNAVAILABLE/BUSY (on non-MS SDK).

http://issues.apache.org/bugzilla/show_bug.cgi?id=44155

Comment 29 Ruediger Pluem 2008-01-19 11:49:00 UTC

Fixed in 2.2.8.

Comment 30 Eric Covener 2009-05-23 14:56:19 UTC

*** Bug 42498 has been marked as a duplicate of this bug. ***

Comment 31 Mark Phippard 2012-11-12 18:07:49 UTC

Is there any reason that the LDAP change committed to the 2.2.23 release would have caused a regression and made this problem return?

I package Subversion Edge which bundles Apache 2.2.23 + SVN 1.7.7.  The version that includes Apache 2.2.23 was only released on October 26th and we have already had at least a dozen different users reporting they are now getting this problem.

Users will get an HTTP 500 when making a Subversion request, and then subsequent requests for that user will be fine again for a while.  When they get the error, this is what is logged:

[info] [client 204.11.125.146] [1248] auth_ldap authenticate: user XXXXX authentication failed; URI /svn/reposname [LDAP: ldap_simple_bind_s() failed][Unavailable]

This is only happening with Windows Apache servers.  I note that the 2.2.23 release includes this change, which seems innocuous but also seems to be the only change related to LDAP in the CHANGES

http://svn.apache.org/viewvc?view=revision&revision=1375696

--- httpd/httpd/branches/2.2.x/include/util_ldap.h      2012/08/21
17:48:34        1375695
+++ httpd/httpd/branches/2.2.x/include/util_ldap.h      2012/08/21
17:48:58        1375696
@@ -30,7 +30,7 @@
#include "apr_time.h"
#include "apr_ldap.h"
-#if APR_HAS_MICROSOFT_LDAPSDK
+#ifdef LDAP_UNAVAILABLE
#define AP_LDAP_IS_SERVER_DOWN(s)                ((s) == LDAP_SERVER_DOWN
\
                 ||(s) == LDAP_UNAVAILABLE)
#else