Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-4847

CancelledKeyException causes TThreadedSelectorServer to fail.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0
    • Fix Version/s: None
    • Component/s: Java - Library
    • Labels:
      None

      Description

      When attempting to use TThreadedSelectorServer I see the following exception and then the server becomes inoperable.

      2019-04-03 11:50:37,638 [server.TThreadedSelectorServer] ERROR: run() on SelectorThread exiting due to uncaught error
      java.nio.channels.CancelledKeyException
              at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
              at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:82)
              at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.changeSelectInterests(AbstractNonblockingServer.java:440)
              at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.processInterestChanges(AbstractNonblockingServer.java:191)
              at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:548)
      

      I tracked this down and I think it is caused by the following events :

      1. A frame buffer is created and given a selection key TThreadedSelectorServer.java line 691
      2. The rebuild selector code introduced in THRIFT-4251 is triggered and all selectors key are canceled when the selector is closed TThreadedSelectorServer.java line 668
      3. A frame buffer attempts to modify its invalid selection key causing an exception AbstractNonblockingServer.java line 440

      I added some logging and found that selector.select() would return 0 hundreds of times, but not infinitely. I changed SELECTOR_AUTO_REBUILD_THRESHOLD from 512 to 1,000,000 and the bug did not happen. I don't think this change is the fix, its just what I did as part of debugging this. Not sure what the best fix for this is.

      The situation that triggers this seems to be lots of connections in a very short time period.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                kturner Keith Turner
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: