ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1237

ERRORs being logged when queued responses are sent after socket has closed.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.4, 3.4.0, 3.5.0
    • Fix Version/s: 3.5.0
    • Component/s: server
    • Labels:
      None

      Description

      After applying ZOOKEEPER-1049 to 3.3.3 (I believe the same problem exists in 3.4/3.5 but haven't tested this) I'm seeing the following exception more frequently:

      Oct 19, 1:31:53 PM ERROR
      Unexpected Exception:
      java.nio.channels.CancelledKeyException
      at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
      at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
      at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
      at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
      at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
      at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
      

      This is a long standing problem where we try to send a response after the socket has been closed. Prior to ZOOKEEPER-1049 this issues happened much less frequently (2 sec linger), but I believe it was possible. The timing window is just wider now.

        Activity

        Hide
        Patrick Hunt added a comment -

        sendBuffer is eating (log then ignore) any Exception generated by sock.write - this is problematic.

        In this case - cancelledkeyexception, say the caller is watch notification. We'll end up flooding the logs with ERRORS, one for each failed notification. If we provided indication of failure back to the caller it could decide to stop sending notifications earlier.

        Show
        Patrick Hunt added a comment - sendBuffer is eating (log then ignore) any Exception generated by sock.write - this is problematic. In this case - cancelledkeyexception, say the caller is watch notification. We'll end up flooding the logs with ERRORS, one for each failed notification. If we provided indication of failure back to the caller it could decide to stop sending notifications earlier.
        Hide
        Camille Fournier added a comment -

        Why do we ignore that exception in sendBuffer, instead of closing the connection at that point?

        Show
        Camille Fournier added a comment - Why do we ignore that exception in sendBuffer, instead of closing the connection at that point?
        Hide
        Mahadev konar added a comment -

        Moving it out of 3.4.0, not a blocker.

        Show
        Mahadev konar added a comment - Moving it out of 3.4.0, not a blocker.
        Hide
        John Eikenberry added a comment -

        Using 3.3.5 these errors are filling up my logs making it harder to diagnose issues when they do happen. Is there any progress on this?

        Show
        John Eikenberry added a comment - Using 3.3.5 these errors are filling up my logs making it harder to diagnose issues when they do happen. Is there any progress on this?
        Hide
        Chris Seawood added a comment -

        This patch downgrades the EndOfStream Exception to a debug message. It also adds .isValid() checks to avoid the CancelledKeyException.

        At a glance, the real problem appears to be that there's no way for the client to deregister itself from the server via the C API. According to zookeeper.h, calling zookeeper_close() is supposed to close the filehandle and free up resources (presumably on the client side) but afaict, there's nothing on the server side to acknowledge that a client has legitimately disconnected. In NIOServerCnxn.java, there's even a comment to the effect that when the server initiates a disconnect, it only closes the socket and then lets the doIO() routine clean things up, which results in the additional exceptions being thrown.

        Show
        Chris Seawood added a comment - This patch downgrades the EndOfStream Exception to a debug message. It also adds .isValid() checks to avoid the CancelledKeyException. At a glance, the real problem appears to be that there's no way for the client to deregister itself from the server via the C API. According to zookeeper.h, calling zookeeper_close() is supposed to close the filehandle and free up resources (presumably on the client side) but afaict, there's nothing on the server side to acknowledge that a client has legitimately disconnected. In NIOServerCnxn.java, there's even a comment to the effect that when the server initiates a disconnect, it only closes the socket and then lets the doIO() routine clean things up, which results in the additional exceptions being thrown.
        Hide
        chendihao added a comment -

        Is there any progress now? Patrick Hunt
        We have the same problem when shutting down the thread without closing the client. Using 3.3.5.

        Show
        chendihao added a comment - Is there any progress now? Patrick Hunt We have the same problem when shutting down the thread without closing the client. Using 3.3.5.
        Hide
        Michi Mutsuzaki added a comment -

        Is this still a problem in 3.4/3.5? How can I reproduce this?

        Show
        Michi Mutsuzaki added a comment - Is this still a problem in 3.4/3.5? How can I reproduce this?
        Hide
        Yanpeng Lin added a comment -

        I got this CancelledKeyException from 3.4.6.

        Show
        Yanpeng Lin added a comment - I got this CancelledKeyException from 3.4.6.
        Hide
        Michi Mutsuzaki added a comment -

        Thanks Yanpeng. Would it be easy to write a small testcase to reproduce the issue?

        Show
        Michi Mutsuzaki added a comment - Thanks Yanpeng. Would it be easy to write a small testcase to reproduce the issue?
        Hide
        Clark Haskins added a comment -

        2014-06-23 19:43:20,227 - ERROR [CommitProcessor:3:NIOServerCnxn@445] - Unexpected Exception:
        java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
        at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545)
        at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115)
        at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87)
        at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577)
        at org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829)
        at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804)
        at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:103)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
        2014-06-23 19:43:20,227 - ERROR [CommitProcessor:3:NIOServerCnxn@445] - Unexpected Exception:
        java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
        at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545)
        at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115)
        at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87)
        at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577)
        at org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829)
        at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804)
        at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:103)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)

        Show
        Clark Haskins added a comment - 2014-06-23 19:43:20,227 - ERROR [CommitProcessor:3:NIOServerCnxn@445] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87) at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577) at org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:103) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) 2014-06-23 19:43:20,227 - ERROR [CommitProcessor:3:NIOServerCnxn@445] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87) at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577) at org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:103) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
        Hide
        Jon Bringhurst added a comment -

        Clark Haskins's stack trace is for Zookeeper 3.3.4.

        Show
        Jon Bringhurst added a comment - Clark Haskins 's stack trace is for Zookeeper 3.3.4.
        Hide
        Flavio Junqueira added a comment -

        I'm essentially wondering the same thing as Michi, are the additional checks in the proposed patch enough?

        Show
        Flavio Junqueira added a comment - I'm essentially wondering the same thing as Michi, are the additional checks in the proposed patch enough?
        Hide
        adroit added a comment -

        I am using 3.4.6 and we have same issue.

        ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
        java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(Unknown Source)
        at sun.nio.ch.SelectionKeyImpl.interestOps(Unknown Source)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
        at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1118)
        at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:120)
        at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:92)
        at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:591)
        at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:802)
        at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
        at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:994)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)

        Show
        adroit added a comment - I am using 3.4.6 and we have same issue. ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(Unknown Source) at sun.nio.ch.SelectionKeyImpl.interestOps(Unknown Source) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1118) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:120) at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:92) at org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:591) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:802) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:994) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)

          People

          • Assignee:
            Unassigned
            Reporter:
            Patrick Hunt
          • Votes:
            3 Vote for this issue
            Watchers:
            22 Start watching this issue

            Dates

            • Created:
              Updated:

              Development