Bug 55267 - NIO thread locked
Summary: NIO thread locked
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 7.0.41
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-15 11:38 UTC by Pavel
Modified: 2013-07-24 15:02 UTC (History)
0 users



Attachments
dump (40.21 KB, application/x-zip-compressed)
2013-07-15 12:51 UTC, Pavel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel 2013-07-15 11:38:25 UTC
Did some load test and thread got stuck, see dump.

Basically I just simulate 100 users that connect and disconnect.
Comment 1 Mark Thomas 2013-07-15 12:08:58 UTC
There is no thread dump attached to this issue. The bug will get resolved as INVALID without one.
Comment 2 Pavel 2013-07-15 12:51:34 UTC
Created attachment 30595 [details]
dump
Comment 3 Mark Thomas 2013-07-22 17:24:04 UTC
Thanks for the thread dump. I'm going to assume that other information provided elsewhere is unchanged. i.e. the thread is not stuck but configured to use an excessively long timeout of 30 minutes.

I have not been able to recreate this issue - even with a debugger - but I think I have got close enough to figure out what is going on and I believe I have a solution.

I believe the following is happening:
- Atmosphere is setting a 30 minute timeout for Comet connections
- That timeout is used to set the socket read/write timeout
- When the client disconnects Atmosphere closes the comet event
- That triggers a write to the socket and at this point the timeout is still 30 mins
- Something about the exact timing and state means that an exception is not thrown

One could argue several root causes:
 1 Tomcat should only apply the Comet timeout to reads, not writes
 2 Atomsphere should reset the timeout once an error occurs
 3 Tomcat should reset the timeout once an error occurs
 4 Tomcat should reset the timeout as soon as close is called on the comet event

We might be able to do something about 1 but 3 and/or 4 look like simpler solutions. Looking at implementing a solution is next on my TODO list.
Comment 4 Pavel 2013-07-23 08:02:57 UTC
Great - waiting for the fix (I'm trying to figure out this issue for months, but it's indeed very hard to replicate locally.)
Comment 5 Mark Thomas 2013-07-23 14:28:30 UTC
I believe I have fixed this in trunk. In the end, option 1 was simpler to implement. You'll need to build from svn to test this. Let us know how you get on. If you still see the problem, feel free to re-open this issue.
Comment 6 Pavel 2013-07-24 15:02:59 UTC
Great - quick question - can I build and use the tomcat on prod? or do I need to do some optimizations to use the trunk version on prod?