As suggested by mladen turk I'm filling a bug regarding this tomcat native issue: https://community.jboss.org/thread/212044 It seems that Jboss AS 7.1.2 uses 1.1.22 version of tomcat native. If you need more information please contact me. Thank you.
Yes, we had similar issue on users list but was not entered into BZ cause reporter could not figure out the reason. I think I know what might be the issue, so it would be great if you can verify the fix (after I apply it to the subversion)
Yes, I'll verify it the next day(s) after you apply the fix.
OK. I have applied possible fix. Checkout the 1.1.x branch svn co https://svn.apache.org/repos/asf/tomcat/native/branches/1.1.x or apply a fix to tomcat-native-1.1.24 (might be easier to do since you won't need apr sources, just apr-devel package for Ubuntu) http://svn.apache.org/viewvc/tomcat/native/branches/1.1.x/native/src/network.c?r1=1403635&r2=1403634&pathrev=1403635&view=patch
Ok, sorry for the delay. Can you help me with the native libraries building? I need the library built for a linux-x86-64 environment. Unfortunately I have only one machine like this: the live machine and I cannot build there.
What's the OS (and version) you are running that on?
Ubuntu server 10.04.
Created attachment 29548 [details] tomcat-native dev Try with this one. It's build on Debian 6 (closest I have to Ubuntu 10)
It's not working :( I get no errors on jboss startup, but jboss web is refusing to serve any http requests with this new library. There are no errors, the request is just hanging until timeout. I've just replaced the old native library with new one. Maybe should I have rebuilt the whole jboss and jboss web? Seems complicated as jboss uses tomcat native by a jboss native dependency.
OK. It seems its either wrong binary or the patch is faulty. Nevertheless let me setup Ubuntu 10.04-4 and I'll check that. From where did you get the AS 7.1.2. That's not official community release, so it's either build from source or its from EAP-6.0. In later case which natives you are using?
Yes. My Jboss is built from github sources: https://github.com/jbossas/jboss-as/tree/7.1.2.Final.
OK, so it seems we are still missing some info. 1. Did you build tomcat-native as well? If not from where it comes. 2. Where do you load tomcat-native from modules/org/jboss/as/web/main/lib/linux-x86_64 or you depend on system LD_LIBRARY_PATH 3. IIUC you are using https (openssl version 0.9.8k) when those delays occur. Does it happen for non-ssl layer as well?
Created attachment 29560 [details] tomcat native for ubuntu 10.04 This is build on ubuntu server 10.04 linking to system's apr (1.3.8)
Answers: 1. No. It seems that the build system for jboss produces the native libraries auto magically. It declares a dependency with jboss native 2.0.10 and I think it downloads the libraries from somewhere, they couldn't be built locally. If you build the Jboss 7.1.2 from github you will have the exact libraries I'm using. 2. Jboss v. > 7.1.1 has a "native" configuration parameter for jboss web <subsystem xmlns="urn:jboss:domain:web:1.1" ... native="true"> that enables or disables the native support. They are definitely loaded from modules/org/jboss/as/web/main/lib/linux-x86_64. Here I replace the jboss library with the library you sent to me. 3. Yes. The application runs only on https. And the slow connections with high CPU usage happen only when using ssl with the native connector "native="true". When using the ssl with the java connector (native="false") there are some slow requests, but they deterministic: clients connecting with strange devices, slow connections and more importantly the CPU usage remains normal. The bad news is that the last library you attached works better, but is not usable. It serves https requests, but the pages are only partially loaded. Sorry.
OK, thanks for a detailed feedback. It seems patch I made is faulty so let me try something different. Because of high CPU usage you observe its probable we have some endless loop and detecting such scenarios can be a real PITA, especially since its caused by some sort of client-server communication irregularity. I'd appreciated if you could test few more versions with some debug logging added so we get some clue what is going on with those margin requests. BTW, what kind of client you are using, and is there some consistency on client usage (e.g more errors with particular client or similar)?
I think I found a pattern for the slow connections. I've used in the last week only the java https connector, not the native connector. And the very slow connections(300s, 600s) still happen with the same rate: 0.01 - 0.02% off all requests. I checked the logs from the previous weeks when I was using the native connector and found they happen when a client with a dynamic IP have its IP changed mid time a request: it starts the request with an IP and waits the response on another one. It might happen on slow connections when receiving a larger response. So the only difference between the java connector and native connector is that the last one uses CPU intensively to solve this use case. After the request is served the CPU usage goes down. Why are the connectors slow in this case? What can it be done? If you have a patch for the native connector that solves at least the high CPU problem I'll try it.
Hmm, This looks like an endless loop where socket is trying to write to something that's not there any more. The problem with that is how to simulate such situation. Since it happens with SSL only I presume its hidden somewhere in our SSL send loop. Since it seems the only way is by trial and error let me check few options. I'll prepare real tcnative binary for Linux, so hope you'll be able to check few runs.
*** This bug has been marked as a duplicate of bug 52856 ***