There seems to be a problem with pooled connections. Everything works fine but after a while when no authentications are performed ldap_simple_bind_s fails with code 0x34 LDAP_UNAVAILABLE. This causes that the user is reprompted for his password. Observing the pooled LDAP connection with netstat gives some interesting information. When everything is working the TCP Socket state to the LDAP Server is ESTABLISHED. After a while the state changes to CLOSE_WAIT. After this state change the authentication will fail once. Maybe this happens since the remote server closes the connection. I'm working in an ActiveDirectory environment.
What LDAP client library is this? 2.0.55 LDAP checks for SERVER_DOWN retcode and retries. That's the retcode I've seen with a couple of LDAP client libraries when the LDAP server has dropped the connection and a subsequent attempt to use the connection fails.
I compiled the LDAP modules myself with the Microsft Windows Server 2003 SP1 Platform SDK. The regular binaries didn't work at all. I think this issue is covered in another bug. Short answer is I'm using the Microsoft LDAP SDK. Since I really need the mod_auth_ldap I patched the code so it also retries when the code is LDAP_UNAVAILABLE. This seems to work.
Experiencing same problem After about 10+ minutes I'm seeing the following in the error log [LDAP: ldap_simple_bind_s() failed][Unavailable] Users see an Internal Server Error page If they refresh the page, the page will load again. Looks like the LDAP connection pool becomes invalid and an error is displayed. After the error, the next request reconnects to the ldap. I have tried almost every immaginable LDAPCacheTTL and LDAPOpCacheTTL value except for disabling caching altogether.
(In reply to comment #3) > After the error, the next request reconnects to the ldap. > I have tried almost every immaginable LDAPCacheTTL and LDAPOpCacheTTL value > except for disabling caching altogether. There aren't any LDAPxxx directives that control the connection cache. The connection cache is enabled by default with no way to modify it through configuration. The existing LDAPCachexxx directives only control the various credential caches. Bottomline is that there is no way to disable the LDAP connection caching short of adding a new directive to mod_ldap. The current code implementation is designed to work basically in the manner that you have described. It is designed to choose a connection from the cache, attempt to use the connection and if it fails, unbind and mark the connection as bad so that the next time it is pulled from the cache, it will reconnect.
So the only work around is to edit the source code? (In reply to comment #4) > (In reply to comment #3) > > After the error, the next request reconnects to the ldap. > > I have tried almost every immaginable LDAPCacheTTL and LDAPOpCacheTTL value > > except for disabling caching altogether. > > There aren't any LDAPxxx directives that control the connection cache. The > connection cache is enabled by default with no way to modify it through > configuration. The existing LDAPCachexxx directives only control the various > credential caches. Bottomline is that there is no way to disable the LDAP > connection caching short of adding a new directive to mod_ldap. > > The current code implementation is designed to work basically in the manner > that you have described. It is designed to choose a connection from the > cache, attempt to use the connection and if it fails, unbind and mark the > connection as bad so that the next time it is pulled from the cache, it will > reconnect. > >
This problem is still present in 2.2 on WinXP. Did anyone have any success in resolving the problem in any way?
I've had a look at the network traffic. What happens is that after succesful LDAP message exchanges the TCP connection is teared down from server side with a FIN. mod_ldap answers with an ACK and therefore keeps its side of the connection open. Later on mod_ldap sends a bind request over this half open connection, the LDAP server responds with a RST and Apache throws a 500. As half open connections don't make much sense for LDAP I'd say it would be better if mod_ldap would send a FIN-ACK when receiving a FIN and tear down the connection completely.
Part of the problem is that you are looking at the 2.0.x code base of mod_ldap. There really isn't much working going on in the 2.0.x code base since mod_authnz_ldap and mod_ldap have moved on significantly in 2.2.x and trunk. This same issue has already been addressed in 2.2.x (see bug #40878) but there are no plans to back port the patch to 2.0.x mainly because the code bases are very different between 2.0.x and 2.2.x. I would suggest giving 2.2.x a try and see if that resolves your issue.
Thanks for your prompt reply Brad. This is still present on Apache 2.2 on Win XP connecting to a Windows 2003 AD server. It gives an [LDAP: ldap_simple_bind_s() failed][Unavailable] error message (as opposed to [LDAP: ldap_simple_bind_s() failed][Can't contact LDAP server] in bug 40878). The network traffic analysis from comment #7 was done with the newest available windows binaries.
This goes back to Jeff's comment in comment #2. When mod_ldap attempts to use the connection, is evaluates the error code that is returned. In that evaluation, it looks for SERVER_DOWN and then retries. We have already seen several places where the Windows LDAP server is returning error codes that are different from OpenLDAP or Novell LDAP. This looks like another one. I don't have a Windows LDAP server nor am I running Apache on Windows so I really don't know what error code the Windows LDAP server is returning. But my guess is that the error code is something other than what is expected. More research would have to be done at the source code level by somebody that is running the code on Windows.
I did use latest 2.2.6 code and I did add ldap_simple_bind_s() return code with sprintf( "...%d", rc ) to the ldap_simple_bind_s() failed message. Return code for this case in my logs was 51. I have limeted undestanding for sprintf and I was expecting to see decimal number, but 51 is likely 0x51 ( LDAP_SERVER_DOWN ). TCP traffic sniffing verifies that LDAP client does try again and that second try is successful. But for some reason Apache is still sending Internal server error to a http client. If a http clients makes refresh after "Internal server error" correct page is served by Apache without any further TCP traffic to LDAP server. In the end I agree with Comment #7 The LDAP retry is successful! But as MS LDAP server has closed idle "half open" connection after 10min without any notice to LDAP client Apache seems to run into "Internal server error". LDAP retry is successful, but there is the "Internal server error" to be shown...
It seems that ist is known what needs to be done :-) (i.e. "it would be better if mod_ldap would send a FIN-ACK when receiving a FIN and tear down the connection completely."). Is there any chance that this is finally going to be implemented - the bug is known since 1.5 years!
Mod_ldap doesn't deal with network connections at this level. It is the ldap client library that handles ACK's and FIN's. Mod_ldap doesn't know that anything has happened to the network connection until the next time that it tries to call an ldap API. All it can do is handle the error that is returned. In this case, it is already handling the LDAP_SERVER_DOWN error code properly by re-establishing the connection as was noted in comment #11. The question remains, where is the internal server error message coming from? Somewhere there is a Windows specific error code that is not being handled properly. Without a Windows box, I can't tell where that is. Somebody with the ability to debug through the code on Windows, will need to track it down.
(In reply to comment #13) Brad, thanks for your prompt respond. I do fully understand your position (and frankly speaking assume that at the end <MS AD is to be blamed>). Unfortunately we do not have the possibility to debug the Windows code. In our environment Apache is the frontend to Subversion and Active Directory is used "just" to authenticate the users. Whereas an occasional "Internal Server" error from Apache is not a huge problem for a human user (after retrying once all works well) this causes SERIOUS problems for automated taskes, e.g. for Continuous Integration. I wonder if the solution as proposed in comment #2 is the reasonable (dirty) workaround for the "poor" MS AD Users ;-) I already asked at Apache Lounge if somebody would volunteer to do this job for the community.
We've patched util_ldap.c on our build of apache to in uldap_connection_open to check for LDAP_UNAVAILABLE. The patch is actually conditional on #if APR_HAS_MICROSOFT_LDAPSDK The problem is indeed the Microsoft SDK. When the TCP RST comes in, instead of returning back to the calling code an LDAP_SERVER_DOWN return code, it returns LDAP_UNAVAILABLE, which according to the LDAP RFCs is wrong. LDAP_UNAVAILABLE is supposed to be a server return code telling the client, "I'm unavailable, maybe I'm shutting down, maybe I'm in the middle of maintenance". The SDK APIs should not be interpreting TCP RSTs as an LDAP_UNAVAILABLE. However, good luck getting Microsoft to admit and/or fix this. If there's enough interest, I can pull out our patch (we have multiple patches against the file and unfortunately no diff files right now), but I think it's pretty easy to change uldap_connection_open to do the retry on LDAP_UNAVAILABLE if the MS LDAP SDK is used.
Andy, that would be wonderful. There are certainly quite a few people who would like to see a fix (and be it for a bug in MS' implementation).
Created attachment 21121 [details] use macro for LDAP_SERVER_DOWN For trunk we'd put the macro in apr-util 1.3.x, for Apache 2.2.x the macro would really be in util_ldap.c I assume this is effectively the patch some of the commenters are using, can someone give it a try? 2.2.x version of patch here: http://people.apache.org/~covener/2.2.x-ldap-unavailable.diff
(In reply to comment #17) > Created an attachment (id=21121) [edit] > use macro for LDAP_SERVER_DOWN Doesn't this change reverse the sense of the test? - if (LDAP_SERVER_DOWN != rc) { + if (APR_LDAP_SERVER_DOWN(rc)) { It should be if (!APR_LDAP_SERVER_DOWN(rc)) { right?
Created attachment 21124 [details] try 2 thanks khym@azeotrope.org , 2.2.x: http://people.apache.org/~covener/2.2.x-ldap-unavailable-2.diff
Eric's patch is pretty much what we do, and we've tested it against active directory and Sun's LDAP server from Windows Apache servers and it seems to do the right thing.
Created attachment 21129 [details] Patch for Apache 2.2.6 (tested successfully with Apache 2.2.4 as well) This was tested to work 100% OK with Apache on Windows (built with Visual Studio 2005 - download from Apache Lounge). For more details & explanations see http://www.apachelounge.com/forum/viewtopic.php?t=1995
I don't think it's wise to have other SDKs retry when they return LDAP_UNAVAILABLE
I agree with Eric. LDAP_UNAVAILABLE is specifically to tell the ldap client to go away and in cases where it's working correctly, the LDAP client should not be retrying. I prefer the conditional macro definition in the previous patch.
Can someone with APR karma look at this for APR 1.3.x: http://issues.apache.org/bugzilla/show_bug.cgi?id=43875 (thanks sasha, missed the check in authnz_ldap)
(In reply to comment #24) > ... > (thanks sasha, missed the check in authnz_ldap) > You are welcome Eric. Unfortunately MS ActiveDirectory returns LDAP_UNAVAILABLE when it should not, so this is the possible (dirty, but simple) workaround ;-)
re: "LDAP_UNAVAILABLE is specifically to tell the ldap client to go away" It's not clear that this interpretation is universal, or that the handling of LDAP_UNAVAILABLE must necessarily differ from LDAP_SERVER_DOWN. OpenLDAP, for example, will substitute LDAP_UNAVAILABLE for LDAP_SERVER_DOWN. See http://www.openldap.org/devel/cvsweb.cgi/~checkout~/servers/slapd/result.c line 1544 (near the bottom). The OpenLDAP back-server code and test programs retry on LDAP_UNAVAILABLE for all platforms. See tests/progs/slapd-search.c:186 and servers/slapd/back-meta/bind.c:705
It would be great for the ldap gurus to wrap their heads around this and reach a concensus. Is it time to ping openldap-devel@openldap.org for their official position on retries and timing when presented with LDAP_UNAVAILABLE? Is there a reason this hasn't been at least /conditionally/ committed for win32?
Thanks for revisiting Bill, I've created a separate bug to handle the less pressing issue of retry on LDAP_UNAVAILABLE/BUSY (on non-MS SDK). http://issues.apache.org/bugzilla/show_bug.cgi?id=44155
Fixed in 2.2.8.
*** Bug 42498 has been marked as a duplicate of this bug. ***
Is there any reason that the LDAP change committed to the 2.2.23 release would have caused a regression and made this problem return? I package Subversion Edge which bundles Apache 2.2.23 + SVN 1.7.7. The version that includes Apache 2.2.23 was only released on October 26th and we have already had at least a dozen different users reporting they are now getting this problem. Users will get an HTTP 500 when making a Subversion request, and then subsequent requests for that user will be fine again for a while. When they get the error, this is what is logged: [info] [client 204.11.125.146] [1248] auth_ldap authenticate: user XXXXX authentication failed; URI /svn/reposname [LDAP: ldap_simple_bind_s() failed][Unavailable] This is only happening with Windows Apache servers. I note that the 2.2.23 release includes this change, which seems innocuous but also seems to be the only change related to LDAP in the CHANGES http://svn.apache.org/viewvc?view=revision&revision=1375696 --- httpd/httpd/branches/2.2.x/include/util_ldap.h 2012/08/21 17:48:34 1375695 +++ httpd/httpd/branches/2.2.x/include/util_ldap.h 2012/08/21 17:48:58 1375696 @@ -30,7 +30,7 @@ #include "apr_time.h" #include "apr_ldap.h" -#if APR_HAS_MICROSOFT_LDAPSDK +#ifdef LDAP_UNAVAILABLE #define AP_LDAP_IS_SERVER_DOWN(s) ((s) == LDAP_SERVER_DOWN \ ||(s) == LDAP_UNAVAILABLE) #else