MINA
  1. MINA
  2. DIRMINA-827

NioSocketConnector leaks the last open NioSocketSession after close

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.2
    • Fix Version/s: 2.0.3
    • Component/s: Core
    • Environment:
      Windows 7/Solaris SPARC/Solaris x86
      Java versions 6u18 & 6u24

      Description

      My company's program uses MINA to make multiple (usually) simultaneous connections to various other machines, while reusing the same NioSocketConnector for each new connection. For better or worse, we store various objects in the IoSession's attributes, which we expect to get released on close. This is not always the case, however, as said session remains in memory until either a new connection is made or the IoConnector is disposed. After writing the simplest Connector program I could (I have several servers available on my network, so I did not write a matching Acceptor), and performing some profiling and debugging, I was able to confirm the leak and identify the issue.

      Below is my Connector test program

      public class MinaMain {
          private static final Logger LOGGER = LoggerFactory.getLogger(MinaMain.class);
      
          public static void main(String[] args) throws InterruptedException {
              LOGGER.trace("Waiting for YourKit to start");
              Thread.sleep(15000);
      
              NioSocketConnector connector = new NioSocketConnector();
              connector.setHandler(new IoHandlerAdapter());
              closeSession(getConnectedSession(connector));
      
      //        LOGGER.info("Creating and closing 5 sessions in series");
      //
      //        for (int x = 0; x < 5; x++) {
      //            IoSession session = getConnectedSession(connector);
      //
      //            if (session == null) {
      //                continue;
      //            }
      //
      //            closeSession(session);
      //        }
      //
      //        LOGGER.info("Creating 5 sessions and then closing 5 sessions");
      //        IoSession[] sessions = new IoSession[5];
      //
      //        for (int x = 0; x < 5; x++) {
      //            sessions[x] = getConnectedSession(connector);
      //        }
      //
      //        for (int x = 0; x < 5; x++) {
      //            IoSession session = sessions[x];
      //
      //            if (session != null) {
      //                closeSession(session);
      //                sessions[x] = null;
      //            }
      //        }
      
              LOGGER.info("Test complete. Sleeping for 60s");
              Thread.sleep(60000);
          }
      
          private static IoSession getConnectedSession(IoConnector connector) throws InterruptedException {
              IoSession session = null;
      
              try {
                  ConnectFuture future = connector.connect(new InetSocketAddress("134.64.37.183", 11109));
      
                  future.addListener(new IoFutureListener<ConnectFuture>() {
                      @Override
                      public void operationComplete(ConnectFuture future) {
                          LOGGER.debug("Connection completed callback for session " + future.getSession().getId());
                      }
                  });
      
                  future.awaitUninterruptibly();
                  LOGGER.debug("Connection created for session " + future.getSession().getId());
                  session = future.getSession();
                  Thread.sleep(15000);
              }
              catch (Exception e) {
                  LOGGER.error("Failed to connect", e);
              }
      
              return session;
          }
      
          private static void closeSession(IoSession session) throws InterruptedException {
              try {
                  CloseFuture closeFuture = session.getCloseFuture();
      
                  closeFuture.addListener(new IoFutureListener<CloseFuture>() {
                      @Override
                      public void operationComplete(CloseFuture future) {
                          LOGGER.debug("Session closed callback for session " + future.getSession().getId());
                      }
                  });
      
                  LOGGER.debug("Attempting to close session " + session.getId());
                  session.close(false);
      
                  LOGGER.debug("IoSession.close(false) returned. Awaiting uninterruptibily for session " + session.getId());
                  closeFuture.awaitUninterruptibly();
      
                  LOGGER.debug("Close completed for session " + session.getId());
              }
              catch (Exception e) {
                  LOGGER.error("Failed to close session " + session.getId());
                  session.close(true);
              }
      
              Thread.sleep(15000);
          }
      }
      

      Attached is the hprof file of my program, taken after the program has run (during the 60 second sleep at the end) and a screen shot of the GC Root of the leaked NioSocketSession.

      After a couple days of debugging, I've identified why this is happening. Below is a snippet from the AbstractPollingIoConnector.Connector thread

      private class Connector implements Runnable {
      
              public void run() {
                  int nHandles = 0;
                  while (selectable) {
                      try {
                          // the timeout for select shall be smaller of the connect
                          // timeout or 1 second...
                          int timeout = (int)Math.min(getConnectTimeoutMillis(), 1000L);
                          int selected = select(timeout);
      
                          nHandles += registerNew();
      
                          if (selected > 0) {
                              nHandles -= processConnections(selectedHandles());
                          }
      
                          processTimedOutSessions(allHandles());
      
                          nHandles -= cancelKeys();
      
                          if (nHandles == 0) {
                              synchronized (lock) {
                                  if (connectQueue.isEmpty()) {
                                      connector = null;
                                      break;
                                  }
                              }
                          }
                      }
      
      ...
      

      When IoSession.close(boolean) is called, the above Connector thread cancels the SelectionKey associated with the session, which reduces the number of handles from 1 to 0. Immediately afterwards, the thread stops executing because there are no handles and nothing in the connetQueue, however, the previously cancelled key is still in the Selector because its cancelledKeys Set is not processed until the next call to select(int)

      Calling dispose on the IoConnector restarts the Connector thread, which calls Selector.select(int), at which time, the Selector cleans out its canclledKeys.
      Creating a new connection causes the same thing to happen, because the new session also recreates the AbstractPollingIoConnector.Connector thread, as there is a new handle to be processed.

      1. MinaMain-2011-04-10.hprof
        4.83 MB
        Matt Yates
      2. mina-core-2.0.4-SNAPSHOT.jar
        630 kB
        Emmanuel Lecharny
      3. mina-core-2.0.4-SNAPSHOT.jar
        630 kB
        Emmanuel Lecharny
      4. ASF.LICENSE.NOT.GRANTED--sessionGcRoot.jpg
        37 kB
        Matt Yates
      5. ASF.LICENSE.NOT.GRANTED--IoConnectorWithCleanSelector.jpg
        128 kB
        Matt Yates

        Activity

        Hide
        Emmanuel Lecharny added a comment -

        Definitively included in 2.0.3

        2.0.3 vote has been closed 2 hours ago, the release is on its way. It will be announced tomorrow, because we have to wait for the mirrors to be updated all over the planet.

        Show
        Emmanuel Lecharny added a comment - Definitively included in 2.0.3 2.0.3 vote has been closed 2 hours ago, the release is on its way. It will be announced tomorrow, because we have to wait for the mirrors to be updated all over the planet.
        Hide
        Matt Yates added a comment -

        Is this going into 2.0.3? There is no fix-version assigned... Also, is there an ETA for 2.0.3? Thanks!

        Show
        Matt Yates added a comment - Is this going into 2.0.3? There is no fix-version assigned... Also, is there an ETA for 2.0.3? Thanks!
        Hide
        Matt Yates added a comment - - edited

        I can't really tell if it's blocking, b/c it never blocks for more than 1 second, even if you set a large timeout

        int timeout = (int)Math.min(getConnectTimeoutMillis(), 1000L);

        Regardless, moving the check for zero handles works, and with the extra insurance of the wakeup() call, I think we have a winner.

        Thank you very much for your attention/work on this issue!

        Show
        Matt Yates added a comment - - edited I can't really tell if it's blocking, b/c it never blocks for more than 1 second, even if you set a large timeout int timeout = (int)Math.min(getConnectTimeoutMillis(), 1000L); Regardless, moving the check for zero handles works, and with the extra insurance of the wakeup() call, I think we have a winner. Thank you very much for your attention/work on this issue!
        Show
        Emmanuel Lecharny added a comment - Fixed with http://svn.apache.org/viewvc?rev=1091209&view=rev
        Hide
        Emmanuel Lecharny added a comment -

        Ok, great.

        i'll cancel the vote for 2.0.3 right now, wait for your feedback about teh wakeup() (it should have no impact), and start a new vote tomorrow morning.

        It's quite a good timing, as we just fixed a couple of other issues in the code base, so I don't really mind cancelling the release. All in all, it's just 20 minutes to cut a new one, and 72 hours to wait before it's official.

        Tahnks !

        Show
        Emmanuel Lecharny added a comment - Ok, great. i'll cancel the vote for 2.0.3 right now, wait for your feedback about teh wakeup() (it should have no impact), and start a new vote tomorrow morning. It's quite a good timing, as we just fixed a couple of other issues in the code base, so I don't really mind cancelling the release. All in all, it's just 20 minutes to cut a new one, and 72 hours to wait before it's official. Tahnks !
        Hide
        Matt Yates added a comment -

        I'll play around with wakeup vs non wakeup later today

        Show
        Matt Yates added a comment - I'll play around with wakeup vs non wakeup later today
        Hide
        Matt Yates added a comment -

        It looks like that worked! There's no SelectionKeys left in the cancellKeys or the channelArray of the Selector, which means no lingering IoSession.

        Show
        Matt Yates added a comment - It looks like that worked! There's no SelectionKeys left in the cancellKeys or the channelArray of the Selector, which means no lingering IoSession.
        Hide
        Emmanuel Lecharny added a comment -

        Same jar, with a wakeup()

        Show
        Emmanuel Lecharny added a comment - Same jar, with a wakeup()
        Hide
        Emmanuel Lecharny added a comment -

        Adding this portion of code in the cancelKeys method could help :

        private int cancelKeys() {
        int nHandles = 0;

        for (;

        { ... }

        // If we had some cancelled keys, wake up the selector.
        if ( nHandles > 0 )

        { wakeup(); }

        return nHandles;
        }

        Show
        Emmanuel Lecharny added a comment - Adding this portion of code in the cancelKeys method could help : private int cancelKeys() { int nHandles = 0; for (; { ... } // If we had some cancelled keys, wake up the selector. if ( nHandles > 0 ) { wakeup(); } return nHandles; }
        Hide
        Emmanuel Lecharny added a comment -

        wakeup() just makes sure that select() won't block. I haven't added it in the code atm, but I think I have to, because I'm not sure the select(TIMEOUT) won't block if we just have some cancelledKeys.

        The best way to check would be to use a longer timeount (like 30s) and see what happens.

        Show
        Emmanuel Lecharny added a comment - wakeup() just makes sure that select() won't block. I haven't added it in the code atm, but I think I have to, because I'm not sure the select(TIMEOUT) won't block if we just have some cancelledKeys. The best way to check would be to use a longer timeount (like 30s) and see what happens.
        Hide
        Matt Yates added a comment -

        Ah, I see. Since I'm using the IoHandlerAdapter as my IoHandler implmentation, that would explain why nothing happened. Calling wakeup() might work, so long as it triggers a select(), but I can't tell because wakeup() dives into native code

        Show
        Matt Yates added a comment - Ah, I see. Since I'm using the IoHandlerAdapter as my IoHandler implmentation, that would explain why nothing happened. Calling wakeup() might work, so long as it triggers a select(), but I can't tell because wakeup() dives into native code
        Hide
        Emmanuel Lecharny added a comment -

        A new version of the lib, with the suggested modification.

        Can you test it and give me some feedback ?

        Thanks !

        Show
        Emmanuel Lecharny added a comment - A new version of the lib, with the suggested modification. Can you test it and give me some feedback ? Thanks !
        Hide
        Emmanuel Lecharny added a comment -

        The sessionClose() is an even handler by the IoHandler(). It gives the application a chance to cleanup correctly the sessions.

        However, my patch was not good enough, as the attachement was still present in the SelectionKeyn and in the CancelledKeys set.

        I'm just thinking we should also wakeup() the select, otherwise, we may be blocked, not sure though.

        Let me patch agan, following your siggestion, and attach a new version of the jar.

        Show
        Emmanuel Lecharny added a comment - The sessionClose() is an even handler by the IoHandler(). It gives the application a chance to cleanup correctly the sessions. However, my patch was not good enough, as the attachement was still present in the SelectionKeyn and in the CancelledKeys set. I'm just thinking we should also wakeup() the select, otherwise, we may be blocked, not sure though. Let me patch agan, following your siggestion, and attach a new version of the jar.
        Hide
        Matt Yates added a comment - - edited

        I'm not sure what 'sessionClosed()' event to which you are referring, but it does not seem to fix the problem. What if you simply moved the 'if (nHandles == 0)' block up to happen before the handle count potentially shrinks?

        Currently:

        ...
        int selected = select(timeout);
        nHandles += registerNew();

        if (selected > 0)

        { nHandles -= processConnections(selectedHandles()); }

        processTimedOutSessions(allHandles());
        nHandles -= cancelKeys();

        if (nHandles == 0) {
        synchronized (lock) {
        if (connectQueue.isEmpty()) { connector = null; break; }
        }
        }
        ...

        Proposed:

        ...
        int selected = select(timeout);
        nHandles += registerNew()

        if (nHandles == 0) {
        synchronized (lock) {
        if (connectQueue.isEmpty()) { connector = null; break; }
        }
        }

        if (selected > 0) { nHandles -= processConnections(selectedHandles()); }

        processTimedOutSessions(allHandles());
        nHandles -= cancelKeys();
        ...

        This way if processConnections(Iterator<SocketChannel>) or cancelKeys() removes a handle, the Selector still gets one last select(int) call to clean up the next time through, and if a new connection is created in that time, the thread won't be destroyed and immediately recreated.

        Show
        Matt Yates added a comment - - edited I'm not sure what 'sessionClosed()' event to which you are referring, but it does not seem to fix the problem. What if you simply moved the 'if (nHandles == 0)' block up to happen before the handle count potentially shrinks? Currently: ... int selected = select(timeout); nHandles += registerNew(); if (selected > 0) { nHandles -= processConnections(selectedHandles()); } processTimedOutSessions(allHandles()); nHandles -= cancelKeys(); if (nHandles == 0) { synchronized (lock) { if (connectQueue.isEmpty()) { connector = null; break; } } } ... Proposed: ... int selected = select(timeout); nHandles += registerNew() if (nHandles == 0) { synchronized (lock) { if (connectQueue.isEmpty()) { connector = null; break; } } } if (selected > 0) { nHandles -= processConnections(selectedHandles()); } processTimedOutSessions(allHandles()); nHandles -= cancelKeys(); ... This way if processConnections(Iterator<SocketChannel>) or cancelKeys() removes a handle, the Selector still gets one last select(int) call to clean up the next time through, and if a new connection is created in that time, the thread won't be destroyed and immediately recreated.
        Hide
        Emmanuel Lecharny added a comment -

        I have patched the code to call the sessionClosed() event when the sessions are being closed in the cancelKeys() loop. The mina-core har is attached to this JIRA

        Can you give it a try and tell me if it fixes the issue ?

        Thanks !

        Show
        Emmanuel Lecharny added a comment - I have patched the code to call the sessionClosed() event when the sessions are being closed in the cancelKeys() loop. The mina-core har is attached to this JIRA Can you give it a try and tell me if it fixes the issue ? Thanks !
        Hide
        Victor N added a comment -

        Can confirm that the problem exists - we are using connector too.

        Show
        Victor N added a comment - Can confirm that the problem exists - we are using connector too.
        Hide
        Emmanuel Lecharny added a comment -

        I have to investigate more closely the problem.

        Just wondering if catching the sessionClosed() event in your handler, and clean up the context there, could help.

        i'll do that tomorrow, and if the problem can't be fixed in the application, I'll try to fix it n the code, which means I'll psotpone the release '(2.0.3 is currently being voted), and start a new vote as soon as I get a fix for your problem.

        Show
        Emmanuel Lecharny added a comment - I have to investigate more closely the problem. Just wondering if catching the sessionClosed() event in your handler, and clean up the context there, could help. i'll do that tomorrow, and if the problem can't be fixed in the application, I'll try to fix it n the code, which means I'll psotpone the release '(2.0.3 is currently being voted), and start a new vote as soon as I get a fix for your problem.
        Hide
        Matt Yates added a comment -

        close(true) has the same issue. All it does is skip the processor.flush(session) part of the socket shutdown data flow. The AbstractPollingIoConnector.Connector thread still dies before Selector.select(int) is called again, preventing the SelectionKey (and therefore the session) from being cleaned up

        Show
        Matt Yates added a comment - close(true) has the same issue. All it does is skip the processor.flush(session) part of the socket shutdown data flow. The AbstractPollingIoConnector.Connector thread still dies before Selector.select(int) is called again, preventing the SelectionKey (and therefore the session) from being cleaned up
        Hide
        Emmanuel Lecharny added a comment -

        Try with session.close(true)

        Show
        Emmanuel Lecharny added a comment - Try with session.close(true)
        Hide
        Matt Yates added a comment -

        Here is the path of strong references that are keeping the session from being released

        Show
        Matt Yates added a comment - Here is the path of strong references that are keeping the session from being released

          People

          • Assignee:
            Emmanuel Lecharny
            Reporter:
            Matt Yates
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development