We've been running Tomcat 6.0.29 on FC8 2.6.21 with tens of thousands of long polling threads which usually work fine. Every few days though we will experience a sudden lockup of the NIO connector and it has to be restarted. These have been seen to be accompanied by the following stack trace: Exception in thread "http-8082-ClientPoller-0" java.lang.NullPointerException at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1620) at java.lang.Thread.run(Thread.java:662) Looking at the source it seems the issue is likely to be a race condition where access() is called on a null attachment, probably while it's in the process of being cancelled: while (iterator != null && iterator.hasNext()) { SelectionKey sk = (SelectionKey) iterator.next(); KeyAttachment attachment = (KeyAttachment)sk.attachment(); /*NPE*/ attachment.access(); iterator.remove(); processKey(sk, attachment); }//while
Steven, would it be possible for you to upgrade to the latest (6.0.29) Tomcat version? I seem to recall a recent fix to the NIO connector that fixes some threading issues, though I can't seem to find a reference for it at the moment.
Ooh, sorry. I misread your version number. Duh.
I haven't reproduced it, but I would imagine that inserting a Thread.sleep() after the call to key.attach(null) in cancelledKey() might do it. For now I have just put a null check in the above loop like so: if (attachment != null) { attachment.access(); iterator.remove(); processKey(sk, attachment); } else { log.warn("NioEndpoint: Attachment was null"); iterator.remove(); } Not sure if that is correct, but better than the alternative ;)
The null check seems reasonable to me. I have fixed this in 7.0.x and it will be included in 7.0.6 onwards. I have also proposed the fix for 6.0.x.
Fixed in 6.0.x and will be included in 6.0.30 onwards.