Bug 54064 - tomcat native randomly very slow with high CPU usage
Summary: tomcat native randomly very slow with high CPU usage
Status: RESOLVED DUPLICATE of bug 52856
Alias: None
Product: Tomcat Native
Classification: Unclassified
Component: Library (show other bugs)
Version: 1.1.22
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-29 13:17 UTC by dragos cernahoschi
Modified: 2012-11-19 05:35 UTC (History)
0 users



Attachments
tomcat-native dev (197.12 KB, application/x-gzip)
2012-11-05 13:12 UTC, Mladen Turk
Details
tomcat native for ubuntu 10.04 (319.59 KB, application/x-gzip)
2012-11-06 16:31 UTC, Mladen Turk
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dragos cernahoschi 2012-10-29 13:17:20 UTC
As suggested by mladen turk I'm filling a bug regarding this tomcat native issue:
https://community.jboss.org/thread/212044

It seems that Jboss AS 7.1.2 uses 1.1.22 version of tomcat native.

If you need more information please contact me.

Thank you.
Comment 1 Mladen Turk 2012-10-29 13:23:20 UTC
Yes, we had similar issue on users list but was not entered into BZ cause reporter could not figure out the reason. I think I know what might be the issue, so it would be great if you can verify the fix (after I apply it to the subversion)
Comment 2 dragos cernahoschi 2012-10-29 16:21:07 UTC
Yes, I'll verify it the next day(s) after you apply the fix.
Comment 3 Mladen Turk 2012-10-30 07:31:41 UTC
OK.
I have applied possible fix.
Checkout the 1.1.x branch

svn co https://svn.apache.org/repos/asf/tomcat/native/branches/1.1.x

or apply a fix to tomcat-native-1.1.24 (might be easier to do since you won't
need apr sources, just apr-devel package for Ubuntu)

http://svn.apache.org/viewvc/tomcat/native/branches/1.1.x/native/src/network.c?r1=1403635&r2=1403634&pathrev=1403635&view=patch
Comment 4 dragos cernahoschi 2012-11-05 11:59:37 UTC
Ok, sorry for the delay.

Can you help me with the native libraries building? I need the library built for a linux-x86-64 environment. Unfortunately I have only one machine like this: the live machine and I cannot build there.
Comment 5 Mladen Turk 2012-11-05 12:40:33 UTC
What's the OS (and version) you are running that on?
Comment 6 dragos cernahoschi 2012-11-05 12:54:50 UTC
Ubuntu server 10.04.
Comment 7 Mladen Turk 2012-11-05 13:12:11 UTC
Created attachment 29548 [details]
tomcat-native dev

Try with this one. It's build on Debian 6 (closest I have to Ubuntu 10)
Comment 8 dragos cernahoschi 2012-11-05 22:09:31 UTC
It's not working :( I get no errors on jboss startup, but jboss web is refusing to serve any http requests with this new library. There are no errors, the request is just hanging until timeout.

I've just replaced the old native library with new one. Maybe should I have rebuilt the whole jboss and jboss web? Seems complicated as jboss uses tomcat native by a jboss native dependency.
Comment 9 Mladen Turk 2012-11-06 04:13:02 UTC
OK. It seems its either wrong binary or the patch is faulty.
Nevertheless let me setup Ubuntu 10.04-4 and I'll check that.
From where did you get the AS 7.1.2. That's not official community release, so it's either build from source or its from EAP-6.0. In later case which natives you are using?
Comment 10 dragos cernahoschi 2012-11-06 07:49:31 UTC
Yes. My Jboss is built from github sources: https://github.com/jbossas/jboss-as/tree/7.1.2.Final.
Comment 11 Mladen Turk 2012-11-06 14:12:43 UTC
OK, so it seems we are still missing some info.

1. Did you build tomcat-native as well? 
   If not from where it comes.
2. Where do you load tomcat-native from
   modules/org/jboss/as/web/main/lib/linux-x86_64
   or you depend on system LD_LIBRARY_PATH
3. IIUC you are using https (openssl version 0.9.8k) when those delays occur.
   Does it happen for non-ssl layer as well?
Comment 12 Mladen Turk 2012-11-06 16:31:56 UTC
Created attachment 29560 [details]
tomcat native for ubuntu 10.04

This is build on ubuntu server 10.04 linking to system's apr (1.3.8)
Comment 13 dragos cernahoschi 2012-11-06 22:26:43 UTC
Answers:

1. No. It seems that the build system for jboss produces the native libraries auto magically. It declares a dependency with jboss native 2.0.10 and I think it downloads the libraries from somewhere, they couldn't be built locally. If you build the Jboss 7.1.2 from github you will have the exact libraries I'm using.

2. Jboss v. > 7.1.1 has a "native" configuration parameter for jboss web <subsystem xmlns="urn:jboss:domain:web:1.1" ... native="true"> that enables or disables the native support. They are definitely loaded from modules/org/jboss/as/web/main/lib/linux-x86_64. Here I replace the jboss library with the library you sent to me.

3. Yes. The application runs only on https. And the slow connections with high CPU usage happen only when using ssl with the native connector "native="true". When using the ssl with the java connector (native="false") there are some slow requests, but they deterministic: clients connecting with strange devices, slow connections and more importantly the CPU usage remains normal.

The bad news is that the last library you attached works better, but is not usable. It serves https requests, but the pages are only partially loaded. Sorry.
Comment 14 Mladen Turk 2012-11-07 04:28:02 UTC
OK, thanks for a detailed feedback.
It seems patch I made is faulty so let me try something different.

Because of high CPU usage you observe its probable we have some endless loop and detecting such scenarios can be a real PITA, especially since its caused by some sort of client-server communication irregularity.

I'd appreciated if you could test few more versions with some debug logging added so we get some clue what is going on with those margin requests.

BTW, what kind of client you are using, and is there some consistency on client usage (e.g more errors with particular client or similar)?
Comment 15 dragos cernahoschi 2012-11-09 10:50:48 UTC
I think I found a pattern for the slow connections. I've used in the last week only the java https connector, not the native connector. And the very slow connections(300s, 600s) still happen with the same rate: 0.01 - 0.02% off all requests.

I checked the logs from the previous weeks when I was using the native connector and found they happen when a client with a dynamic IP have its IP changed mid time a request: it starts the request with an IP and waits the response on another one. It might happen on slow connections when receiving a larger response.

So the only difference between the java connector and native connector is that the last one uses CPU intensively to solve this use case. After the request is served the CPU usage goes down.

Why are the connectors slow in this case? What can it be done?

If you have a patch for the native connector that solves at least the high CPU problem I'll try it.
Comment 16 Mladen Turk 2012-11-12 05:40:55 UTC
Hmm,

This looks like an endless loop where socket is trying to write to something that's not there any more. The problem with that is how to simulate such situation. Since it happens with SSL only I presume its hidden somewhere in our SSL send loop. Since it seems the only way is by trial and error let me check few options. I'll prepare real tcnative binary for Linux, so hope you'll be able to check few runs.
Comment 17 Mladen Turk 2012-11-19 05:35:30 UTC

*** This bug has been marked as a duplicate of bug 52856 ***