Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1237

ERRORs being logged when queued responses are sent after socket has closed.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.3.4, 3.4.0, 3.5.0
    • Fix Version/s: None
    • Component/s: server
    • Labels:
      None

      Description

      After applying ZOOKEEPER-1049 to 3.3.3 (I believe the same problem exists in 3.4/3.5 but haven't tested this) I'm seeing the following exception more frequently:

      Oct 19, 1:31:53 PM ERROR
      Unexpected Exception:
      java.nio.channels.CancelledKeyException
      at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
      at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
      at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
      at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
      at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
      at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
      

      This is a long standing problem where we try to send a response after the socket has been closed. Prior to ZOOKEEPER-1049 this issues happened much less frequently (2 sec linger), but I believe it was possible. The timing window is just wider now.

        Issue Links

          Activity

          Hide
          phunt Patrick Hunt added a comment -

          sendBuffer is eating (log then ignore) any Exception generated by sock.write - this is problematic.

          In this case - cancelledkeyexception, say the caller is watch notification. We'll end up flooding the logs with ERRORS, one for each failed notification. If we provided indication of failure back to the caller it could decide to stop sending notifications earlier.

          Show
          phunt Patrick Hunt added a comment - sendBuffer is eating (log then ignore) any Exception generated by sock.write - this is problematic. In this case - cancelledkeyexception, say the caller is watch notification. We'll end up flooding the logs with ERRORS, one for each failed notification. If we provided indication of failure back to the caller it could decide to stop sending notifications earlier.
          Hide
          fournc Camille Fournier added a comment -

          Why do we ignore that exception in sendBuffer, instead of closing the connection at that point?

          Show
          fournc Camille Fournier added a comment - Why do we ignore that exception in sendBuffer, instead of closing the connection at that point?
          Hide
          mahadev Mahadev konar added a comment -

          Moving it out of 3.4.0, not a blocker.

          Show
          mahadev Mahadev konar added a comment - Moving it out of 3.4.0, not a blocker.
          Hide
          eikenberry John Eikenberry added a comment -

          Using 3.3.5 these errors are filling up my logs making it harder to diagnose issues when they do happen. Is there any progress on this?

          Show
          eikenberry John Eikenberry added a comment - Using 3.3.5 these errors are filling up my logs making it harder to diagnose issues when they do happen. Is there any progress on this?
          Hide
          cls Chris Seawood added a comment -

          This patch downgrades the EndOfStream Exception to a debug message. It also adds .isValid() checks to avoid the CancelledKeyException.

          At a glance, the real problem appears to be that there's no way for the client to deregister itself from the server via the C API. According to zookeeper.h, calling zookeeper_close() is supposed to close the filehandle and free up resources (presumably on the client side) but afaict, there's nothing on the server side to acknowledge that a client has legitimately disconnected. In NIOServerCnxn.java, there's even a comment to the effect that when the server initiates a disconnect, it only closes the socket and then lets the doIO() routine clean things up, which results in the additional exceptions being thrown.

          Show
          cls Chris Seawood added a comment - This patch downgrades the EndOfStream Exception to a debug message. It also adds .isValid() checks to avoid the CancelledKeyException. At a glance, the real problem appears to be that there's no way for the client to deregister itself from the server via the C API. According to zookeeper.h, calling zookeeper_close() is supposed to close the filehandle and free up resources (presumably on the client side) but afaict, there's nothing on the server side to acknowledge that a client has legitimately disconnected. In NIOServerCnxn.java, there's even a comment to the effect that when the server initiates a disconnect, it only closes the socket and then lets the doIO() routine clean things up, which results in the additional exceptions being thrown.
          Hide
          tobe chendihao added a comment -

          Is there any progress now? Patrick Hunt
          We have the same problem when shutting down the thread without closing the client. Using 3.3.5.

          Show
          tobe chendihao added a comment - Is there any progress now? Patrick Hunt We have the same problem when shutting down the thread without closing the client. Using 3.3.5.
          Hide
          michim Michi Mutsuzaki added a comment -

          Is this still a problem in 3.4/3.5? How can I reproduce this?

          Show
          michim Michi Mutsuzaki added a comment - Is this still a problem in 3.4/3.5? How can I reproduce this?
          Hide
          yanpeng Lin Yanpeng Lin added a comment -

          I got this CancelledKeyException from 3.4.6.

          Show
          yanpeng Lin Yanpeng Lin added a comment - I got this CancelledKeyException from 3.4.6.
          Hide
          michim Michi Mutsuzaki added a comment -

          Thanks Yanpeng. Would it be easy to write a small testcase to reproduce the issue?

          Show
          michim Michi Mutsuzaki added a comment - Thanks Yanpeng. Would it be easy to write a small testcase to reproduce the issue?
          Hide
          clarkhaskins Clark Haskins added a comment -

          2014-06-23 19:43:20,227 - ERROR [CommitProcessor:3:NIOServerCnxn@445] - Unexpected Exception:
          java.nio.channels.CancelledKeyException
          at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
          at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
          at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
          at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
          at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545)
          at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115)
          at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87)
          at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577)
          at org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829)
          at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804)
          at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328)
          at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:103)
          at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
          2014-06-23 19:43:20,227 - ERROR [CommitProcessor:3:NIOServerCnxn@445] - Unexpected Exception:
          java.nio.channels.CancelledKeyException
          at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
          at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
          at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
          at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
          at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545)
          at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115)
          at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87)
          at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577)
          at org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829)
          at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804)
          at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328)
          at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:103)
          at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)

          Show
          clarkhaskins Clark Haskins added a comment - 2014-06-23 19:43:20,227 - ERROR [CommitProcessor:3:NIOServerCnxn@445] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87) at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577) at org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:103) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) 2014-06-23 19:43:20,227 - ERROR [CommitProcessor:3:NIOServerCnxn@445] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87) at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577) at org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:103) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
          Hide
          jonbringhurst Jon Bringhurst added a comment -

          Clark Haskins's stack trace is for Zookeeper 3.3.4.

          Show
          jonbringhurst Jon Bringhurst added a comment - Clark Haskins 's stack trace is for Zookeeper 3.3.4.
          Hide
          fpj Flavio Junqueira added a comment -

          I'm essentially wondering the same thing as Michi, are the additional checks in the proposed patch enough?

          Show
          fpj Flavio Junqueira added a comment - I'm essentially wondering the same thing as Michi, are the additional checks in the proposed patch enough?
          Hide
          getadroit@gmail.com adroit added a comment -

          I am using 3.4.6 and we have same issue.

          ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
          java.nio.channels.CancelledKeyException
          at sun.nio.ch.SelectionKeyImpl.ensureValid(Unknown Source)
          at sun.nio.ch.SelectionKeyImpl.interestOps(Unknown Source)
          at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
          at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
          at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1118)
          at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:120)
          at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:92)
          at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:591)
          at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:802)
          at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
          at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:994)
          at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
          at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)

          Show
          getadroit@gmail.com adroit added a comment - I am using 3.4.6 and we have same issue. ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(Unknown Source) at sun.nio.ch.SelectionKeyImpl.interestOps(Unknown Source) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1118) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:120) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:92) at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:591) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:802) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:994) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
          Hide
          arcadius Arcadius Ahouansou added a comment -

          Any progress on this please?

          Show
          arcadius Arcadius Ahouansou added a comment - Any progress on this please?
          Hide
          DEvil0000 Alexander Binzberger added a comment -

          looks similar to ZOOKEEPER-2044 - could this be the same thing?

          Show
          DEvil0000 Alexander Binzberger added a comment - looks similar to ZOOKEEPER-2044 - could this be the same thing?

            People

            • Assignee:
              Unassigned
              Reporter:
              phunt Patrick Hunt
            • Votes:
              16 Vote for this issue
              Watchers:
              39 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development