Uploaded image for project: 'MINA'
  1. MINA
  2. DIRMINA-760

Client fails to detect disconnection

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.14
    • Component/s: Core
    • Labels:
      None

      Description

      Tested against revision 901694 (which is a bit after 2.0.0-RC1)

      My client need to maintain an open connection to the server.
      If I kill the server right before calling session.write(), the client does not detect that the server is gone.
      there is no exceptionCaught event, and messageSent is actually called (which suggests successful delivery).

        Activity

        Hide
        omry Omry Yadan added a comment -

        Note:
        I am testing on Linux with JDK 1.6.0_12

        Show
        omry Omry Yadan added a comment - Note: I am testing on Linux with JDK 1.6.0_12
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Updated the versions

        Show
        elecharny Emmanuel Lecharny added a comment - Updated the versions
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Moved to 2.0

        I confirm the issue. If you try to see that the connection has been closed, you even can block for ever on a synchronized(lock) block. It seems that the way we handle the connection with a ConnectFuture is dorked.

        Show
        elecharny Emmanuel Lecharny added a comment - Moved to 2.0 I confirm the issue. If you try to see that the connection has been closed, you even can block for ever on a synchronized(lock) block. It seems that the way we handle the connection with a ConnectFuture is dorked.
        Hide
        elecharny Emmanuel Lecharny added a comment -

        We need some unit test to demonstrate the issue

        Show
        elecharny Emmanuel Lecharny added a comment - We need some unit test to demonstrate the issue
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Postponed to 2.0.1

        Show
        elecharny Emmanuel Lecharny added a comment - Postponed to 2.0.1
        Hide
        antoine.tran Antoine Tran added a comment - - edited

        I confirm this issue with the latest version 2.0.9. I did what the author did. The exceptionCaught is not called, but I have not tested if messageSent was called.

        Moreover, I can add that at the socket level, when I killed the server process, the client socket status was ESTABLISHED, then disappeared from netstat. So Mina should be able to detect disconnection, at least in this scenario.

        My test environment is Linux Red Hat 6.5.

        Any info of when or what we can do as a workaround ? This issue is very important for us. Thank you.

        UPDATE: I corrected this by using sessionClosed method. Using exceptionCaught is not enough. But still, Mina should throw an exception when the session is closed.

        Show
        antoine.tran Antoine Tran added a comment - - edited I confirm this issue with the latest version 2.0.9. I did what the author did. The exceptionCaught is not called, but I have not tested if messageSent was called. Moreover, I can add that at the socket level, when I killed the server process, the client socket status was ESTABLISHED, then disappeared from netstat. So Mina should be able to detect disconnection, at least in this scenario. My test environment is Linux Red Hat 6.5. Any info of when or what we can do as a workaround ? This issue is very important for us. Thank you. UPDATE: I corrected this by using sessionClosed method. Using exceptionCaught is not enough. But still, Mina should throw an exception when the session is closed.
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Have you a simple unit test that we can use to debug the issue ?

        Thanks !

        Show
        elecharny Emmanuel Lecharny added a comment - Have you a simple unit test that we can use to debug the issue ? Thanks !
        Hide
        elecharny Emmanuel Lecharny added a comment -

        When a peer brutally close its side of the connection, there is no way for the other peer to know about it. This is teh way TCP qworks. The only solution is to periodically 'test' the connection : if it has been closed by the other peer, you'll get an error. This is why we do have a KeepAlive filter, or a sessionIdle() event : to be able to detect this use case.

        Show
        elecharny Emmanuel Lecharny added a comment - When a peer brutally close its side of the connection, there is no way for the other peer to know about it. This is teh way TCP qworks. The only solution is to periodically 'test' the connection : if it has been closed by the other peer, you'll get an error. This is why we do have a KeepAlive filter, or a sessionIdle() event : to be able to detect this use case.
        Hide
        omry Omry Yadan added a comment - - edited

        Emmanuel,
        Quoting the task:
        "If I kill the server right before calling session.write(), the client does not detect that the server is gone.
        there is no exceptionCaught event, and messageSent is actually called (which suggests successful delivery)."

        TCP does support detection of this scenario pretty reliably because the session.write() is equivalent to a connection test.
        I don't really care at this point other way, but it feels like a wrong resolution.

        Show
        omry Omry Yadan added a comment - - edited Emmanuel, Quoting the task: "If I kill the server right before calling session.write(), the client does not detect that the server is gone. there is no exceptionCaught event, and messageSent is actually called (which suggests successful delivery)." TCP does support detection of this scenario pretty reliably because the session.write() is equivalent to a connection test. I don't really care at this point other way, but it feels like a wrong resolution.
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Feel free to reopen the ticket, no problem.

        What I would need is a demonstrable piece of code to get this fixed. Is that possible to get one ?

        Thanks !

        Show
        elecharny Emmanuel Lecharny added a comment - Feel free to reopen the ticket, no problem. What I would need is a demonstrable piece of code to get this fixed. Is that possible to get one ? Thanks !
        Hide
        omry Omry Yadan added a comment -

        I no longer have the client code, but it sounds it it should be simple enough to create a client that connects to a server, nuke the server with kill -9 and then have the client call session.write().

        Show
        omry Omry Yadan added a comment - I no longer have the client code, but it sounds it it should be simple enough to create a client that connects to a server, nuke the server with kill -9 and then have the client call session.write().
        Hide
        elecharny Emmanuel Lecharny added a comment -

        I'll give it a try in thenext 3 hours, that I have to spend in a train anyway...

        Show
        elecharny Emmanuel Lecharny added a comment - I'll give it a try in thenext 3 hours, that I have to spend in a train anyway...
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Reopened

        Show
        elecharny Emmanuel Lecharny added a comment - Reopened
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Ok I reused the chat server as a test, and had a client sending 100 000 messages, and killed the server in the middle of it. Here is the stackTrace I get on the client :

        [20:37:48] NioProcessor-2 INFO  [] [localhost/127.0.0.1:1234] [org.apache.mina.filter.logging.LoggingFilter] - SENT: BROADCAST abcd22401
        [20:37:48] NioProcessor-2 INFO  [] [localhost/127.0.0.1:1234] [org.apache.mina.filter.logging.LoggingFilter] - SENT: BROADCAST abcd22402
        [20:37:48] NioProcessor-2 WARN  [] [localhost/127.0.0.1:1234] [org.apache.mina.filter.logging.LoggingFilter] - EXCEPTION :
        org.apache.mina.core.write.WriteToClosedSessionException
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.clearWriteRequestQueue(AbstractPollingIoProcessor.java:625)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeNow(AbstractPollingIoProcessor.java:568)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.writeBuffer(AbstractPollingIoProcessor.java:915)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:835)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:762)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:68)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1108)
        	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        	at java.lang.Thread.run(Thread.java:745)
        [20:37:48] NioProcessor-2 WARN  [] [localhost/127.0.0.1:1234] [org.apache.mina.core.service.IoHandlerAdapter] - EXCEPTION, please implement org.apache.mina.example.chat.client.SwingChatClientHandler.exceptionCaught() for proper handling:
        org.apache.mina.core.write.WriteToClosedSessionException
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.clearWriteRequestQueue(AbstractPollingIoProcessor.java:625)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeNow(AbstractPollingIoProcessor.java:568)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.writeBuffer(AbstractPollingIoProcessor.java:915)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:835)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:762)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:68)
        	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1108)
        	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        	at java.lang.Thread.run(Thread.java:745)
        [20:37:48] NioProcessor-2 INFO  [] [localhost/127.0.0.1:1234] [org.apache.mina.filter.logging.LoggingFilter] - CLOSED
        

        Seems like it does what expected (at least in the version I'm currently working on).

        Show
        elecharny Emmanuel Lecharny added a comment - Ok I reused the chat server as a test, and had a client sending 100 000 messages, and killed the server in the middle of it. Here is the stackTrace I get on the client : [20:37:48] NioProcessor-2 INFO [] [localhost/127.0.0.1:1234] [org.apache.mina.filter.logging.LoggingFilter] - SENT: BROADCAST abcd22401 [20:37:48] NioProcessor-2 INFO [] [localhost/127.0.0.1:1234] [org.apache.mina.filter.logging.LoggingFilter] - SENT: BROADCAST abcd22402 [20:37:48] NioProcessor-2 WARN [] [localhost/127.0.0.1:1234] [org.apache.mina.filter.logging.LoggingFilter] - EXCEPTION : org.apache.mina.core.write.WriteToClosedSessionException at org.apache.mina.core.polling.AbstractPollingIoProcessor.clearWriteRequestQueue(AbstractPollingIoProcessor.java:625) at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeNow(AbstractPollingIoProcessor.java:568) at org.apache.mina.core.polling.AbstractPollingIoProcessor.writeBuffer(AbstractPollingIoProcessor.java:915) at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:835) at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:762) at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:68) at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1108) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [20:37:48] NioProcessor-2 WARN [] [localhost/127.0.0.1:1234] [org.apache.mina.core.service.IoHandlerAdapter] - EXCEPTION, please implement org.apache.mina.example.chat.client.SwingChatClientHandler.exceptionCaught() for proper handling: org.apache.mina.core.write.WriteToClosedSessionException at org.apache.mina.core.polling.AbstractPollingIoProcessor.clearWriteRequestQueue(AbstractPollingIoProcessor.java:625) at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeNow(AbstractPollingIoProcessor.java:568) at org.apache.mina.core.polling.AbstractPollingIoProcessor.writeBuffer(AbstractPollingIoProcessor.java:915) at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:835) at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:762) at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:68) at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1108) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [20:37:48] NioProcessor-2 INFO [] [localhost/127.0.0.1:1234] [org.apache.mina.filter.logging.LoggingFilter] - CLOSED Seems like it does what expected (at least in the version I'm currently working on).
        Hide
        omry Omry Yadan added a comment -

        That's good news.
        This is so old that it could have already fixed either in Mina or in the JVM itself.

        if you want to be 100% sure it's gone try to follow the sequence I outlined in he description.
        specifically, I killed the server after client connects and before the client sends a message with session.write().

        otherwise feel free to close.

        Show
        omry Omry Yadan added a comment - That's good news. This is so old that it could have already fixed either in Mina or in the JVM itself. if you want to be 100% sure it's gone try to follow the sequence I outlined in he description. specifically, I killed the server after client connects and before the client sends a message with session.write(). otherwise feel free to close.
        Hide
        elecharny Emmanuel Lecharny added a comment -

        I'll do the test you proposed. It's easy, and if it works, I'll close the issue.

        Thanks !

        Show
        elecharny Emmanuel Lecharny added a comment - I'll do the test you proposed. It's easy, and if it works, I'll close the issue. Thanks !
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Not that easy to reproduce : doing a kill -9 on the server will properly close the connection, and the client will disconnect. The best solution to reproduce the scenario would be to have a remote server, connect the client, and pull the network cable from the server.

        But in this case, there is nothing we can do : we have to wait for the underlying socket to timeout, and it all depends on your OS configuration.

        The only way to 'detect' this use case is to combine the write with the idle status check. If you have done a write and the connection is idle for a moment (you decide how long is acceptable), then there is a problem.

        To be clear : when you 'write' some data, it ends with some bytes being written in a system buffer, that will be read and written to the remote peer later. If teh OS can't write the data, it will retry many times, and it can take quite a while to be done (check the tcp_retries1 and tcp_retries2 parameter of your OS, but that may be up to 30 mins...).

        Show
        elecharny Emmanuel Lecharny added a comment - Not that easy to reproduce : doing a kill -9 on the server will properly close the connection, and the client will disconnect. The best solution to reproduce the scenario would be to have a remote server, connect the client, and pull the network cable from the server. But in this case, there is nothing we can do : we have to wait for the underlying socket to timeout, and it all depends on your OS configuration. The only way to 'detect' this use case is to combine the write with the idle status check. If you have done a write and the connection is idle for a moment (you decide how long is acceptable), then there is a problem. To be clear : when you 'write' some data, it ends with some bytes being written in a system buffer, that will be read and written to the remote peer later. If teh OS can't write the data, it will retry many times, and it can take quite a while to be done (check the tcp_retries1 and tcp_retries2 parameter of your OS, but that may be up to 30 mins...).
        Hide
        omry Omry Yadan added a comment -

        based on my comment "If I kill the server right before calling session.write(), the client does not detect that the server is gone".
        when you kill with -9, does the client detect that the server is gone now?

        Show
        omry Omry Yadan added a comment - based on my comment "If I kill the server right before calling session.write(), the client does not detect that the server is gone". when you kill with -9, does the client detect that the server is gone now?
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Yes, absolutely.

        Show
        elecharny Emmanuel Lecharny added a comment - Yes, absolutely.
        Hide
        omry Omry Yadan added a comment -

        so lets close it, the bug report was about the client not detecting that the server is gone in this scenario. looks like it's already fixed.

        Show
        omry Omry Yadan added a comment - so lets close it, the bug report was about the client not detecting that the server is gone in this scenario. looks like it's already fixed.
        Hide
        elecharny Emmanuel Lecharny added a comment -

        Fixed

        Show
        elecharny Emmanuel Lecharny added a comment - Fixed

          People

          • Assignee:
            Unassigned
            Reporter:
            omry Omry Yadan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development