MINA SSHD
  1. MINA SSHD
  2. SSHD-54

Severe performance penalties when tranferring data from a remote machine

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Labels:
      None

      Description

      I tried setting up a simple SSHD and doing an SCP of a large file from a remote machine.
      I managed to move around 3 MiB/s, compared to the native implementations 11 MiB/s, which is unacceptable.

      I did some analysis with JRockit Mission Control and found that the problem is Socket Read latency. Most of them Socket Reads are only 512 bytes (small packets are normally death) and quite a lot return "-2" from the java nio library, which seems to indicate an UNAVAILABLE code from sun.nio.ch.IOStatus. Thus it seems that Mina is the problem, polling to aggressively.

      I originally thought the bottleneck lay in the wait/notify mechanism of ChannelPipedInputStream, but ~80000 wait/notify pairs in 3 minutes is not too bad.

      Please investigate why Mina SCP suffers from bandwidth problems.

      If I use localhost both as client and server, no similar problems exists.

      1. brokentcp.png
        164 kB
        Marcus Lagergren

        Activity

        Hide
        Marcus Lagergren added a comment -

        Turns out the problem is in Mina. For some reason, the default send buffer size (and receive buffer size) for server sockets is 1k (!!!). Hacking the code to turn it into a more common one meg before bind, resolved my performance issue.

        Show
        Marcus Lagergren added a comment - Turns out the problem is in Mina. For some reason, the default send buffer size (and receive buffer size) for server sockets is 1k (!!!). Hacking the code to turn it into a more common one meg before bind, resolved my performance issue.
        Hide
        Marcus Lagergren added a comment -

        Since apache ftp server, based on the same Mina bits, doesn't have this problem when I run the same test on the same machine, I assume that it can be configured from SSHD rather than by going into Mina.

        Show
        Marcus Lagergren added a comment - Since apache ftp server, based on the same Mina bits, doesn't have this problem when I run the same test on the same machine, I assume that it can be configured from SSHD rather than by going into Mina.
        Hide
        Guillaume Nodet added a comment -

        I haven't tried playing with those values, but it seems ftpserver does not use very high values:
        http://svn.apache.org/repos/asf/mina/ftpserver/trunk/core/src/main/java/org/apache/ftpserver/listener/nio/NioListener.java

        Show
        Guillaume Nodet added a comment - I haven't tried playing with those values, but it seems ftpserver does not use very high values: http://svn.apache.org/repos/asf/mina/ftpserver/trunk/core/src/main/java/org/apache/ftpserver/listener/nio/NioListener.java
        Hide
        Marcus Lagergren added a comment -

        I know that is strange as well. I still have sequences of stalls from time to time and weird behavior when copying large amounts of data between machines with the larger window, but only parts of the time.

        Running wiresharks when there is problems shows a lot of
        [TCP segment of a reassembled PDU]

        and retransmits because the client side suddenly has window = 0

        Have you tried pushing a couple of hundred megs of data over the network with ScpCommand back and forth? It should be on the line rate, but is nowhere near at times. Race somewhere?

        Show
        Marcus Lagergren added a comment - I know that is strange as well. I still have sequences of stalls from time to time and weird behavior when copying large amounts of data between machines with the larger window, but only parts of the time. Running wiresharks when there is problems shows a lot of [TCP segment of a reassembled PDU] and retransmits because the client side suddenly has window = 0 Have you tried pushing a couple of hundred megs of data over the network with ScpCommand back and forth? It should be on the line rate, but is nowhere near at times. Race somewhere?
        Hide
        Marcus Lagergren added a comment -

        Ok very simple reproducer:

        Get an SSHD instance with an SCP command up and running on machine A.
        Go to machine B. Execute scp (a_couple_of_hundred_megs_of_file) A:

        If the initial band width is around 10 MiB/s which it should be (but wasn't for me until I upped the receive buffer size) all is well.

        Try pressing ctrl-break a couple of times in the command prompt window running the server JVM on machine A or sending it a couple of kill -3, i.e. dump thread stack traces.

        This will make the session spin totally out of control and stall. A quick wireshark running at the same time shows massiv TCP degradation.

        Bug in Mina? Too many requests? Asynchronous spill? Race?

        Show
        Marcus Lagergren added a comment - Ok very simple reproducer: Get an SSHD instance with an SCP command up and running on machine A. Go to machine B. Execute scp (a_couple_of_hundred_megs_of_file) A: If the initial band width is around 10 MiB/s which it should be (but wasn't for me until I upped the receive buffer size) all is well. Try pressing ctrl-break a couple of times in the command prompt window running the server JVM on machine A or sending it a couple of kill -3, i.e. dump thread stack traces. This will make the session spin totally out of control and stall. A quick wireshark running at the same time shows massiv TCP degradation. Bug in Mina? Too many requests? Asynchronous spill? Race?
        Hide
        Marcus Lagergren added a comment -

        I might add that it doesn't matter if the server VM is Hotspot or JRockit, which leads me to suspect a bug in SSHD under heavy load (or Mina).

        Show
        Marcus Lagergren added a comment - I might add that it doesn't matter if the server VM is Hotspot or JRockit, which leads me to suspect a bug in SSHD under heavy load (or Mina).
        Hide
        Marcus Lagergren added a comment -

        Attaching a screendump of my wire shark window after having the VM do one stack trace. What
        was previously smoothly flowing data between client (.233) and server (.239) suddenly becomes disruptive, get dup acks, transmits out of order and stalls out.

        Show
        Marcus Lagergren added a comment - Attaching a screendump of my wire shark window after having the VM do one stack trace. What was previously smoothly flowing data between client (.233) and server (.239) suddenly becomes disruptive, get dup acks, transmits out of order and stalls out.
        Hide
        Marcus Lagergren added a comment -

        So I tried running the same server and starting a couple of computationally intensive threads after a while. Worked fine. Definitely has something to do with mina/sshd losing it when stopping threads in the VM (which has to be done by ctrl-break to do the stack traces)

        Show
        Marcus Lagergren added a comment - So I tried running the same server and starting a couple of computationally intensive threads after a while. Worked fine. Definitely has something to do with mina/sshd losing it when stopping threads in the VM (which has to be done by ctrl-break to do the stack traces)
        Hide
        Marcus Lagergren added a comment -

        Apache FTP in the same setup is immune to the ctrl-break repro, although we get a few retransmits

        Thus, I believe something is the problem in SSHD. Pushing packets too quickly? No exception handling? Not knowing Mina it's hard to say what the problem might be

        Show
        Marcus Lagergren added a comment - Apache FTP in the same setup is immune to the ctrl-break repro, although we get a few retransmits Thus, I believe something is the problem in SSHD. Pushing packets too quickly? No exception handling? Not knowing Mina it's hard to say what the problem might be
        Hide
        Marcus Lagergren added a comment -

        This seems mostly to be an issue between Windows and Linux (server on Windows). On Linux to Linux I can even suspend the server (ctrl-z) for several seconds and get a recovery.

        Differences in TCP/IP stacks between Windows and LInux not making them play well together? Anyway, we still should expect OK behavior for Apache Mina SSHD upon slight interruptions.

        Show
        Marcus Lagergren added a comment - This seems mostly to be an issue between Windows and Linux (server on Windows). On Linux to Linux I can even suspend the server (ctrl-z) for several seconds and get a recovery. Differences in TCP/IP stacks between Windows and LInux not making them play well together? Anyway, we still should expect OK behavior for Apache Mina SSHD upon slight interruptions.
        Hide
        Shikhar Bhushan added a comment -

        It would be inefficient to not take the packet length into account when doing socket read's. This is at least an issue I ran into while reusing SSHD code, and after giving this hint to socket read(), performance improved by a whole lot.

        This is where I'm coming from: http://code.google.com/p/commons-net-ssh/source/browse/src/main/java/org/apache/commons/net/ssh/transport/Decoder.java - see decode() & received()

        Show
        Shikhar Bhushan added a comment - It would be inefficient to not take the packet length into account when doing socket read's. This is at least an issue I ran into while reusing SSHD code, and after giving this hint to socket read(), performance improved by a whole lot. This is where I'm coming from: http://code.google.com/p/commons-net-ssh/source/browse/src/main/java/org/apache/commons/net/ssh/transport/Decoder.java - see decode() & received()
        Hide
        Shawn Pearce added a comment -

        I wrote a quick dd command in Java that just faked if=/dev/zero and of=/dev/null behavior and moved around 100 MB of uncompressed NULs.

        Pulling the fake /dev/zero from MINA SSHD (so data flow server --> client), my pair of systems does 27.6 MB/s. OpenSSH with standard dd does 55.2 MB/s. So throughput is just over half of what OpenSSH on the same hardware is capable of doing, but is still respectable.

        Pushing a fake /dev/zero from the client into MINA SSHD (so data flow is server <--- client), the same pair of systems does 3.7 MB/s. OpenSSH does this at 59 MB/s.

        Clearly there's something wrong with the data path coming in from the client, we just can't pull the data off the socket fast enough to keep up.

        Show
        Shawn Pearce added a comment - I wrote a quick dd command in Java that just faked if=/dev/zero and of=/dev/null behavior and moved around 100 MB of uncompressed NULs. Pulling the fake /dev/zero from MINA SSHD (so data flow server --> client), my pair of systems does 27.6 MB/s. OpenSSH with standard dd does 55.2 MB/s. So throughput is just over half of what OpenSSH on the same hardware is capable of doing, but is still respectable. Pushing a fake /dev/zero from the client into MINA SSHD (so data flow is server <--- client), the same pair of systems does 3.7 MB/s. OpenSSH does this at 59 MB/s. Clearly there's something wrong with the data path coming in from the client, we just can't pull the data off the socket fast enough to keep up.
        Hide
        Shawn Pearce added a comment -

        For those of you above claiming the FTP server has high throughput with a very low buffer, the FTP server's data connection appears to be bog-standard blocking Sockets, and bypasses the MINA framework:

        http://svn.apache.org/repos/asf/mina/ftpserver/trunk/core/src/main/java/org/apache/ftpserver/impl/IODataConnection.java

        MINA SSHD on the other hand is doing all of its IO from client->server through the MINA IoProcessor, and it seems the IoProcessor can't poll() and read() the socket fast enough to keep up with the data stream, so TCP flow control forces the sender to slow down.

        As an experiment I spawned a background thread per connection to perform the actual SSHD packet decryption and parsing, so the IoProcessor only had to inject the IoBuffer onto the end of the queue and go back around to read again, and it still only got about 4 MB/s throughput.

        Show
        Shawn Pearce added a comment - For those of you above claiming the FTP server has high throughput with a very low buffer, the FTP server's data connection appears to be bog-standard blocking Sockets, and bypasses the MINA framework: http://svn.apache.org/repos/asf/mina/ftpserver/trunk/core/src/main/java/org/apache/ftpserver/impl/IODataConnection.java MINA SSHD on the other hand is doing all of its IO from client->server through the MINA IoProcessor, and it seems the IoProcessor can't poll() and read() the socket fast enough to keep up with the data stream, so TCP flow control forces the sender to slow down. As an experiment I spawned a background thread per connection to perform the actual SSHD packet decryption and parsing, so the IoProcessor only had to inject the IoBuffer onto the end of the queue and go back around to read again, and it still only got about 4 MB/s throughput.
        Hide
        Shawn Pearce added a comment -

        I think I finally understand this issue now.

        MINA itself is actually forcing the receive buffer size to 1024 bytes, rather than the operating system default. By default the NioAcceptor is created with a DefaultSocketSessionConfig [1] which sets both the send and receive buffer sizes to 1024 bytes by default.

        NioSocketAcceptor [2] then turns around and sets this receive buffer size into the accept socket, so all child sockets created for incoming connections inherit this low default value. However, because the configured value of 1024 matches the default of 1024 for the send buffer, doSetAll [3] never actually changes the send buffer size away from the operating system default.

        The FTP server on the other hand relies on the operating system defaults when it creates its standard blocking IO sockets for the data connections. So its never affected by this strange behavior in the MINA framework.

        Unfortunately this means we have the following situation here for our sockets:

        MINA | OS Default
        -----------------------
        RCV | 1024 | 43690
        SND | 8192 | 8192

        So the receive window under MINA is 42x smaller than the OS default, but the send side is the OS default (which happens to be 8K on my Linux platform), as MINA does not directly set the send buffer size itself. So send throughput is actually reasonable, but receive throughput is horrible.

        There is an article [4] which explains this. Basically if we want any decent TCP/IP throughput we need reasonably large TCP windows to be permitted, to increase the amount of data we allow to transmit before stopping and waiting for an ACK from the peer. If our receive window is only 1024 bytes, we can't even send a full TCP/IP packet on Ethernet without stopping and waiting for an ACK.

        This matches Marcus Lagergren's wireshark dump attached to this bug. The transfer windows are just too small to get any throughput.

        Of course one can argue that the MINA default of 1024 bytes for the receive buffer is perhaps suitable, especially on a server with a very large number of idle connections, or connections where a human is typing all of the input data. But its not suitable for a high traffic data receiver, like the scp command.

        I tried creating my own SessionFactory to boost the receive size, and although the buffer claims to be larger, the effect seems to have been ignored by the underlying operating system, because performance is still horrible:

        setSessionFactory(new SessionFactory() {
        @Override
        protected ServerSession createSession(final IoSession io)
        throws Exception

        { final SocketSessionConfig c = (SocketSessionConfig) io.getConfig(); c.setKeepAlive(keepAlive); c.setReceiveBufferSize(43690); }


        }
        });

        Linux seems to be capping the connection at the upper bound when it started. If I set the receive buffer size of the NioAcceptor to 43690, then drop the receive buffer size of the individual connection to 1024 in the session factory, I can later boost as a command starts, e.g. as the scp command starts execution, and performance behaves as expected.

        [1] http://svn.apache.org/viewvc/mina/tags/2.0.0-RC1/core/src/main/java/org/apache/mina/transport/socket/DefaultSocketSessionConfig.java?view=markup#l44
        [2] http://svn.apache.org/viewvc/mina/tags/2.0.0-RC1/core/src/main/java/org/apache/mina/transport/socket/nio/NioSocketAcceptor.java?view=markup#l248
        [3] http://svn.apache.org/viewvc/mina/tags/2.0.0-RC1/core/src/main/java/org/apache/mina/transport/socket/AbstractSocketSessionConfig.java?view=markup#l58
        [4] http://www.ibm.com/developerworks/linux/library/l-hisock.html

        Show
        Shawn Pearce added a comment - I think I finally understand this issue now. MINA itself is actually forcing the receive buffer size to 1024 bytes, rather than the operating system default. By default the NioAcceptor is created with a DefaultSocketSessionConfig [1] which sets both the send and receive buffer sizes to 1024 bytes by default. NioSocketAcceptor [2] then turns around and sets this receive buffer size into the accept socket, so all child sockets created for incoming connections inherit this low default value. However, because the configured value of 1024 matches the default of 1024 for the send buffer, doSetAll [3] never actually changes the send buffer size away from the operating system default. The FTP server on the other hand relies on the operating system defaults when it creates its standard blocking IO sockets for the data connections. So its never affected by this strange behavior in the MINA framework. Unfortunately this means we have the following situation here for our sockets: MINA | OS Default ------- ---------------- RCV | 1024 | 43690 SND | 8192 | 8192 So the receive window under MINA is 42x smaller than the OS default, but the send side is the OS default (which happens to be 8K on my Linux platform), as MINA does not directly set the send buffer size itself. So send throughput is actually reasonable, but receive throughput is horrible. There is an article [4] which explains this. Basically if we want any decent TCP/IP throughput we need reasonably large TCP windows to be permitted, to increase the amount of data we allow to transmit before stopping and waiting for an ACK from the peer. If our receive window is only 1024 bytes, we can't even send a full TCP/IP packet on Ethernet without stopping and waiting for an ACK. This matches Marcus Lagergren's wireshark dump attached to this bug. The transfer windows are just too small to get any throughput. Of course one can argue that the MINA default of 1024 bytes for the receive buffer is perhaps suitable, especially on a server with a very large number of idle connections, or connections where a human is typing all of the input data. But its not suitable for a high traffic data receiver, like the scp command. I tried creating my own SessionFactory to boost the receive size, and although the buffer claims to be larger, the effect seems to have been ignored by the underlying operating system, because performance is still horrible: setSessionFactory(new SessionFactory() { @Override protected ServerSession createSession(final IoSession io) throws Exception { final SocketSessionConfig c = (SocketSessionConfig) io.getConfig(); c.setKeepAlive(keepAlive); c.setReceiveBufferSize(43690); } } }); Linux seems to be capping the connection at the upper bound when it started. If I set the receive buffer size of the NioAcceptor to 43690, then drop the receive buffer size of the individual connection to 1024 in the session factory, I can later boost as a command starts, e.g. as the scp command starts execution, and performance behaves as expected. [1] http://svn.apache.org/viewvc/mina/tags/2.0.0-RC1/core/src/main/java/org/apache/mina/transport/socket/DefaultSocketSessionConfig.java?view=markup#l44 [2] http://svn.apache.org/viewvc/mina/tags/2.0.0-RC1/core/src/main/java/org/apache/mina/transport/socket/nio/NioSocketAcceptor.java?view=markup#l248 [3] http://svn.apache.org/viewvc/mina/tags/2.0.0-RC1/core/src/main/java/org/apache/mina/transport/socket/AbstractSocketSessionConfig.java?view=markup#l58 [4] http://www.ibm.com/developerworks/linux/library/l-hisock.html
        Hide
        Shawn Pearce added a comment -

        Fixed. (Only for some reason I can't mark issues as fixed here in JIRA....)

        Committing to https://svn.apache.org/repos/asf/mina/sshd/trunk ...
        M sshd-core/src/main/java/org/apache/sshd/SshServer.java
        Committed r891122
        M sshd-core/src/main/java/org/apache/sshd/SshServer.java

        Show
        Shawn Pearce added a comment - Fixed. (Only for some reason I can't mark issues as fixed here in JIRA....) Committing to https://svn.apache.org/repos/asf/mina/sshd/trunk ... M sshd-core/src/main/java/org/apache/sshd/SshServer.java Committed r891122 M sshd-core/src/main/java/org/apache/sshd/SshServer.java

          People

          • Assignee:
            Unassigned
            Reporter:
            Marcus Lagergren
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development