MINA SSHD
  1. MINA SSHD
  2. SSHD-123

TcpipForward race condition & deadlock on client disconnect.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.0, 0.5.0, 0.6.0
    • Fix Version/s: 0.7.0
    • Labels:
      None
    • Environment:
      I am able to reproduce this every single time on my quad core windows 7 machine.

      Description

      If a client with an active remote port forward disconnects at about the same time a new connection comes in to that port forward two NioProcessor threads end up deadlocked in TcpipForwardSupport, at the following spots:

      NioProcessor-14 stuck at org.apache.sshd.server.session.TcpipForwardSupport.sessionCreated(...):
      OpenFuture future = channel.open().await();

      NioProcessor-2 stuck at org.apache.sshd.server.session.TcpipForwardSupport.close():
      acceptor.dispose();

      It appears that the new connection to the port forward is trying to create a new channel over a session that is currently trying to close. Both threads end up waiting on different objects, and are never notified.

      1. SSHD_123_Test.java
        9 kB
        Bill Kuker
      2. SSHD-123.patch
        1 kB
        Bill Kuker
      3. SSHD-123-Stacks.test.txt
        4 kB
        Bill Kuker

        Activity

        Hide
        Bill Kuker added a comment -

        This is a test that reliably exposes the problem, at least on my system. It creates a port forward, makes a connection through it and immediately severs the client connection.

        The test looks through the stack traces of the NioAcceptor threads to look for this specific condition, and times out after 20 seconds. Its probably not a good test to include in the automated unite tests.

        Show
        Bill Kuker added a comment - This is a test that reliably exposes the problem, at least on my system. It creates a port forward, makes a connection through it and immediately severs the client connection. The test looks through the stack traces of the NioAcceptor threads to look for this specific condition, and times out after 20 seconds. Its probably not a good test to include in the automated unite tests.
        Hide
        Bill Kuker added a comment -

        These are the stack traces of the two stuck threads, and the variables from the stack frame just above the calls to Object.wait()

        Show
        Bill Kuker added a comment - These are the stack traces of the two stuck threads, and the variables from the stack frame just above the calls to Object.wait()
        Hide
        Bill Kuker added a comment - - edited

        This patch fixes the problem for me.

        In TcpipForwardSupport.sessionCreated() I wait for the ChannelForwardedTcpip with a timeout, in a loop, and each time the timeout occurs I peek at the acceptor that is trying to dispose in the other thread.

        If that acceptor is disposing or disposed I give up on the port forward.

        It may be more robust to wait with a longer timeout and no loop, this patch simulates the behavior of the original unbounded wait while allowing the acceptor to close simultaneously.

        Show
        Bill Kuker added a comment - - edited This patch fixes the problem for me. In TcpipForwardSupport.sessionCreated() I wait for the ChannelForwardedTcpip with a timeout, in a loop, and each time the timeout occurs I peek at the acceptor that is trying to dispose in the other thread. If that acceptor is disposing or disposed I give up on the port forward. It may be more robust to wait with a longer timeout and no loop, this patch simulates the behavior of the original unbounded wait while allowing the acceptor to close simultaneously.
        Hide
        Guillaume Nodet added a comment -

        Thx for the test and patch Bill. I've slightly reworked the fix to delay the block until the first write, which seems to work better.

        Show
        Guillaume Nodet added a comment - Thx for the test and patch Bill. I've slightly reworked the fix to delay the block until the first write, which seems to work better.

          People

          • Assignee:
            Guillaume Nodet
            Reporter:
            Bill Kuker
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development