Bug 50467 - Occasional NIO connector lockups on high load
Summary: Occasional NIO connector lockups on high load
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 6
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 6.0.29
Hardware: Other Linux
: P2 major (vote)
Target Milestone: default
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-13 14:02 UTC by Steven Hugg
Modified: 2011-01-07 13:40 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steven Hugg 2010-12-13 14:02:35 UTC
We've been running Tomcat 6.0.29 on FC8 2.6.21 with tens of thousands of long polling threads which usually work fine. Every few days though we will experience a sudden lockup of the NIO connector and it has to be restarted. These have been seen to be accompanied by the following stack trace:

Exception in thread "http-8082-ClientPoller-0" java.lang.NullPointerException
	at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1620)
	at java.lang.Thread.run(Thread.java:662)

Looking at the source it seems the issue is likely to be a race condition where access() is called on a null attachment, probably while it's in the process of being cancelled:

                    while (iterator != null && iterator.hasNext()) {
                        SelectionKey sk = (SelectionKey) iterator.next();
                        KeyAttachment attachment = (KeyAttachment)sk.attachment();
/*NPE*/                 attachment.access();
                        iterator.remove();
                        processKey(sk, attachment);
                    }//while
Comment 1 Christopher Schultz 2010-12-14 14:05:10 UTC
Steven, would it be possible for you to upgrade to the latest (6.0.29) Tomcat version? I seem to recall a recent fix to the NIO connector that fixes some threading issues, though I can't seem to find a reference for it at the moment.
Comment 2 Christopher Schultz 2010-12-14 14:06:43 UTC
Ooh, sorry. I misread your version number. Duh.
Comment 3 Steven Hugg 2010-12-16 17:06:55 UTC
I haven't reproduced it, but I would imagine that inserting a Thread.sleep() after the call to key.attach(null) in cancelledKey() might do it.

For now I have just put a null check in the above loop like so:

                        if (attachment != null)
                        {
                            attachment.access();
                            iterator.remove();
                            processKey(sk, attachment);
                        } else {
                            log.warn("NioEndpoint: Attachment was null");
                            iterator.remove();
                        }

Not sure if that is correct, but better than the alternative ;)
Comment 4 Mark Thomas 2011-01-05 08:53:41 UTC
The null check seems reasonable to me.

I have fixed this in 7.0.x and it will be included in 7.0.6 onwards.

I have also proposed the fix for 6.0.x.
Comment 5 Mark Thomas 2011-01-07 13:40:10 UTC
Fixed in 6.0.x and will be included in 6.0.30 onwards.