Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12325

SFTPFileSystem operations should restore cwd

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-beta1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We've seen a case where writing to SFTPFileSystem led to unexpected behaviour:

      Given a directory ./data with more than one files in it, the steps it took to get this error was simply:

      hdfs dfs -fs sftp://x.y.z -mkdir dir0
      hdfs dfs -fs sftp://x.y.z -copyFromLocal data dir0
      hdfs dfs -fs sftp://x.y.z -ls -R dir0
      

      But not all files show up as in the ls output, in fact more often just one single file shows up in that path...

      Digging deeper, we found that rename, mkdirs and create operations in SFTPFileSystem are changing the current working directory during it's execution. For example in create there are:

            client.cd(parent.toUri().getPath());
            os = client.put(f.getName());
      

      The issue here is SFTPConnectionPool is caching SFTP sessions (in idleConnections), which contains their current working directory. So after these operations, the sessions will be put back to cache with a changed working directory. This accumulates in each call and ends up causing unexpected weird behaviour. Basically this error happens when processing multiple file system objects in one operation, and relative path is being used.

      The fix here is to restore the current working directory of the SFTP sessions.

        Activity

        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
              trunk Compile Tests
        +1 mvninstall 15m 6s trunk passed
        +1 compile 16m 9s trunk passed
        +1 checkstyle 0m 38s trunk passed
        +1 mvnsite 1m 37s trunk passed
        +1 findbugs 1m 28s trunk passed
        +1 javadoc 0m 58s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 47s the patch passed
        +1 compile 12m 4s the patch passed
        +1 javac 12m 4s the patch passed
        +1 checkstyle 0m 35s the patch passed
        +1 mvnsite 1m 30s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 45s the patch passed
        +1 javadoc 0m 51s the patch passed
              Other Tests
        -1 unit 8m 4s hadoop-common in the patch failed.
        +1 asflicense 0m 29s The patch does not generate ASF License warnings.
        63m 21s



        Reason Tests
        Failed junit tests hadoop.net.TestDNS



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-12325
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12882698/HDFS-12325.001.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux e478b4fc5d0b 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 8991f0b
        Default Java 1.8.0_144
        findbugs v3.1.0-RC1
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/20766/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20766/testReport/
        modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20766/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       trunk Compile Tests +1 mvninstall 15m 6s trunk passed +1 compile 16m 9s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 1m 37s trunk passed +1 findbugs 1m 28s trunk passed +1 javadoc 0m 58s trunk passed       Patch Compile Tests +1 mvninstall 0m 47s the patch passed +1 compile 12m 4s the patch passed +1 javac 12m 4s the patch passed +1 checkstyle 0m 35s the patch passed +1 mvnsite 1m 30s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 45s the patch passed +1 javadoc 0m 51s the patch passed       Other Tests -1 unit 8m 4s hadoop-common in the patch failed. +1 asflicense 0m 29s The patch does not generate ASF License warnings. 63m 21s Reason Tests Failed junit tests hadoop.net.TestDNS Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-12325 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12882698/HDFS-12325.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e478b4fc5d0b 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8991f0b Default Java 1.8.0_144 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-HDFS-Build/20766/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20766/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20766/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        +1 for the patch. I will commit it shortly.

        I am not sure about the existing exception handling. The connection is 'leaked' on failure. Not affected by this patch and it should be fixed separately.

        Show
        arpitagarwal Arpit Agarwal added a comment - - edited +1 for the patch. I will commit it shortly. I am not sure about the existing exception handling. The connection is 'leaked' on failure. Not affected by this patch and it should be fixed separately.
        Hide
        elgoiri Íñigo Goiri added a comment -

        Shouldn't the restore be done in a finally block?
        Something like:

        boolean renamed = true;
        final String previousCwd = channel.pwd();
        channel.cd("/");
        try {
          channel.rename(src.toUri().getPath(), dst.toUri().getPath());
        } catch (SftpException e) {
          renamed = false;
        } finally {
          channel.cd(previousCwd);
        }
        

        With proper handling of the cd exception.

        Show
        elgoiri Íñigo Goiri added a comment - Shouldn't the restore be done in a finally block? Something like: boolean renamed = true ; final String previousCwd = channel.pwd(); channel.cd( "/" ); try { channel.rename(src.toUri().getPath(), dst.toUri().getPath()); } catch (SftpException e) { renamed = false ; } finally { channel.cd(previousCwd); } With proper handling of the cd exception.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12219 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12219/)
        HDFS-12325. SFTPFileSystem operations should restore cwd. Contributed by (arp: rev 736ceab2f58fb9ab5907c5b5110bd44384038e6b)

        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12219 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12219/ ) HDFS-12325 . SFTPFileSystem operations should restore cwd. Contributed by (arp: rev 736ceab2f58fb9ab5907c5b5110bd44384038e6b) (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java
        Hide
        vagarychen Chen Liang added a comment -

        Thanks Íñigo Goiri for the review, this is good point!

        But just like Arpit Agarwal mentioned, the exception handling of the current code has other issue. Specifically, it may leak client connections. For example, in SFTPFileSystem#create:

            if (parent == null || !mkdirs(client, parent, FsPermission.getDefault())) {
              parent = (parent == null) ? new Path("/") : parent;
              disconnect(client);
              throw new IOException(String.format(E_CREATE_DIR, parent));
            }
        

        This code is making the syntax implication that before throwing the exception to the caller, the client should be disconnected. However if mkdirs does not even return but simply throws exception, then there is no catch clause in this method and the exception gets thrown to the caller, in which case nowhere in the code the client would be disconnected and thus is leaked. So like Arpit suggested, we are planning to revisit all of this class's exception handling and will file another JIRA to fix all of them. In that JIRA we will reexamine how to handle the exception caused by cd part here as you pointed out.

        Show
        vagarychen Chen Liang added a comment - Thanks Íñigo Goiri for the review, this is good point! But just like Arpit Agarwal mentioned, the exception handling of the current code has other issue. Specifically, it may leak client connections. For example, in SFTPFileSystem#create : if (parent == null || !mkdirs(client, parent, FsPermission.getDefault())) { parent = (parent == null ) ? new Path( "/" ) : parent; disconnect(client); throw new IOException( String .format(E_CREATE_DIR, parent)); } This code is making the syntax implication that before throwing the exception to the caller, the client should be disconnected. However if mkdirs does not even return but simply throws exception, then there is no catch clause in this method and the exception gets thrown to the caller, in which case nowhere in the code the client would be disconnected and thus is leaked. So like Arpit suggested, we are planning to revisit all of this class's exception handling and will file another JIRA to fix all of them. In that JIRA we will reexamine how to handle the exception caused by cd part here as you pointed out.
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        The documentation for SftpException is unclear. Perhaps we can reuse the connection if the exception code is SSH_FX_NO_SUCH_FILE but that isn't clear to me. I am inclined to just disconnect the connection on any exception.

        Show
        arpitagarwal Arpit Agarwal added a comment - - edited The documentation for SftpException is unclear. Perhaps we can reuse the connection if the exception code is SSH_FX_NO_SUCH_FILE but that isn't clear to me. I am inclined to just disconnect the connection on any exception.

          People

          • Assignee:
            vagarychen Chen Liang
            Reporter:
            nmaheshwari Namit Maheshwari
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development