Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.12.0
-
None
Description
When attempting to use TThreadedSelectorServer I see the following exception and then the server becomes inoperable.
2019-04-03 11:50:37,638 [server.TThreadedSelectorServer] ERROR: run() on SelectorThread exiting due to uncaught error java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:82) at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.changeSelectInterests(AbstractNonblockingServer.java:440) at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.processInterestChanges(AbstractNonblockingServer.java:191) at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:548)
I tracked this down and I think it is caused by the following events :
- A frame buffer is created and given a selection key TThreadedSelectorServer.java line 691
- The rebuild selector code introduced in
THRIFT-4251is triggered and all selectors key are canceled when the selector is closed TThreadedSelectorServer.java line 668 - A frame buffer attempts to modify its invalid selection key causing an exception AbstractNonblockingServer.java line 440
I added some logging and found that selector.select() would return 0 hundreds of times, but not infinitely. I changed SELECTOR_AUTO_REBUILD_THRESHOLD from 512 to 1,000,000 and the bug did not happen. I don't think this change is the fix, its just what I did as part of debugging this. Not sure what the best fix for this is.
The situation that triggers this seems to be lots of connections in a very short time period.
Attachments
Issue Links
- is caused by
-
THRIFT-4251 Java Epoll Selector Bug
- Closed
- relates to
-
THRIFT-5230 Fix connection leak and CancelledKeyException when handling Epoll bug
- Closed
- links to