Affects Version/s: Nightly Builds
Fix Version/s: 2.3
In our case the server was explicitly configured to disable SSH channelExec.
Our code was hanging trying to execute moveTo(). Stacktrace:
Technically the connection was alive because the session had a configured timeout and the jcraft code kept sending keepalive SSH_MSG_GLOBAL_REQUEST messages, but the thread performing FileObject.moveTo() did not return from moveTo().
I have changed SftpProviderTestCase to reproduce the problem: testRenameFile() hangs.
The patch (patch_sftp_tests_hang_no_exec.diff) is attached.
I traced the problem to the fact that VFS invokes method com.jcraft.jsch.Channel.connect(). This method uses timeout value 0, in which case class com.jcraft.jsch.ChannelExec creates an instance of class com.jcraft.jsch.RequestExec that sends an SSH packet SSH_MSG_CHANNEL_REQUEST with "want reply" set to 0.
Correspondingly, if the server supports SSH channelExec, it executes the specified command and returns some data.
But if the server does not support SSH channelExec it sends nothing back while jcraft code tries to read something. This is the hang I am observing.
The fix would be to invoke com.jcraft.jsch.Channel.connect(int connectTimeout).
As a result jcraft sends an SSH packet SSH_MSG_CHANNEL_REQUEST with "want reply" set to 1 and it waits for an answer and it reacts to the answer.
Correspondingly, if the server supports SSH channelExec, it sends an SSH packet SSH_MSG_CHANNEL_SUCCESS and the executes the specified command and returns some data.
If the server does not support SSH channelExec it sends an SSH packet SSH_MSG_CHANNEL_FAILURE.
jcraft reacts on either of this messages because if waits for one of them. If it receives SSH_MSG_CHANNEL_SUCCESS it goes further and reads the response of the executed command.
If it receives SSH_MSG_CHANNEL_FAILURE it immediately reports this by throwing JSchException with message "failed to send channel request".
There is no hang whatsoever. Instead all tests from ProviderRenameTests fail with errors like
The test suite actually hangs at the end, but this is caused by https://issues.apache.org/jira/browse/VFS-588
I have patched VFS classes to always open jcraft's channels with timeouts. In addition the patch always sets some default timeout value on jcraft's session if none was configured via SftpFileSystemConfigBuilder.
Patch is also attached: patch_sftp_timeouts.diff