Hadoop Common
  1. Hadoop Common
  2. HADOOP-7771

NPE when running hdfs dfs -copyToLocal, -get etc

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 0.24.0
    • Fix Version/s: 0.23.0, 0.24.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      NPE when running hdfs dfs -copyToLocal if the destination directory does not exist. The behavior in branch-0.20-security is to create the directory and copy/get the contents from source.

      1. HADOOP-7771.patch
        4 kB
        John George
      2. HADOOP-7771.patch
        4 kB
        John George
      3. HADOOP-7771.patch
        5 kB
        John George
      4. HADOOP-7771.patch
        5 kB
        John George
      5. HADOOP-7771.patch
        4 kB
        John George
      6. HADOOP-7771.patch
        14 kB
        Daryn Sharp

        Issue Links

          Activity

          Hide
          John George added a comment -

          $ hadoop fs -copyToLocal /tmp/tryout /tmp/thisisit.2
          copyToLocal: Fatal internal error
          java.lang.NullPointerException
          at org.apache.hadoop.fs.shell.PathData.getPathDataForChild(PathData.java:182)
          at org.apache.hadoop.fs.shell.CommandWithDestination.processPaths(CommandWithDestination.java:115)
          at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:329)
          at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:302)
          at org.apache.hadoop.fs.shell.CommandWithDestination.processPaths(CommandWithDestination.java:116)
          at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:272)
          at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:255)
          at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:239)
          at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:105)
          at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:185)
          at org.apache.hadoop.fs.shell.Command.run(Command.java:149)
          at org.apache.hadoop.fs.FsShell.run(FsShell.java:254)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
          at org.apache.hadoop.fs.FsShell.main(FsShell.java:296)

          Show
          John George added a comment - $ hadoop fs -copyToLocal /tmp/tryout /tmp/thisisit.2 copyToLocal: Fatal internal error java.lang.NullPointerException at org.apache.hadoop.fs.shell.PathData.getPathDataForChild(PathData.java:182) at org.apache.hadoop.fs.shell.CommandWithDestination.processPaths(CommandWithDestination.java:115) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:329) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:302) at org.apache.hadoop.fs.shell.CommandWithDestination.processPaths(CommandWithDestination.java:116) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:272) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:255) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:239) at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:105) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:185) at org.apache.hadoop.fs.shell.Command.run(Command.java:149) at org.apache.hadoop.fs.FsShell.run(FsShell.java:254) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.fs.FsShell.main(FsShell.java:296)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12500881/HADOOP-7771.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed the unit tests build

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/317//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/317//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/317//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12500881/HADOOP-7771.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed the unit tests build +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/317//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/317//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/317//console This message is automatically generated.
          Hide
          John George added a comment -

          I dont think the core test failure is related. Common builds have been failing on core tests and findbugs for sometime now.
          I ran mvn clean install -Pdist -Dtar -Ptest-patch on my desktop and the build succeeded.

          Show
          John George added a comment - I dont think the core test failure is related. Common builds have been failing on core tests and findbugs for sometime now. I ran mvn clean install -Pdist -Dtar -Ptest-patch on my desktop and the build succeeded.
          Hide
          Daryn Sharp added a comment -
              * @throws IOException if this object does not exist or is not a directory
              */
             public PathData getPathDataForChild(PathData child) throws IOException {
          -    if (!stat.isDirectory()) {
          +    if (exists && !stat.isDirectory()) {
          

          If the javadoc is true, I think should be:
          if (!exists || !stat.isDirectory())

          The use of isDstDir appears wrong and is duplication of most of the prior logic. Why is this flag used to force processPath() to treat all recursive destinations as directories if the top level dest is a directory? Especially if there's 1 source dir arg? This method is invoked recursive so I'm not sure why it assumes everything is a directory if the top level dst is a directory?

          Are you sure the PathData change isn't all that's needed?

          Show
          Daryn Sharp added a comment - * @ throws IOException if this object does not exist or is not a directory */ public PathData getPathDataForChild(PathData child) throws IOException { - if (!stat.isDirectory()) { + if (exists && !stat.isDirectory()) { If the javadoc is true, I think should be: if (!exists || !stat.isDirectory()) The use of isDstDir appears wrong and is duplication of most of the prior logic. Why is this flag used to force processPath() to treat all recursive destinations as directories if the top level dest is a directory? Especially if there's 1 source dir arg? This method is invoked recursive so I'm not sure why it assumes everything is a directory if the top level dst is a directory? Are you sure the PathData change isn't all that's needed?
          Hide
          John George added a comment -

          If the javadoc is true, I think should be:
          if (!exists || !stat.isDirectory())

          The purpose here is to ensure that destination is a directory and not a file. So previously when we checked for

          if (!stat.isDirectory())
          

          all we were trying to do was to ensure that the destination is a directory. The issue is that stat could be null and so all that the check exist does is to ensure that stat is not null before we even check if it is directory or not! Hence, I believe the check if (exists && !stat.isDirectory()) is the right check.

          // if the destination is a directory, make target a child path,
          // else use the destination as-is
          -    if (dst.exists && dst.stat.isDirectory()) {
          

          Here, (previously) the check for destination succeeded only if the 'dst' already existed. With the new code, it ensures that we detect that the destination is a directory even in cases where the destination does not exist. The logic does not change at all. Am I missing something obvious?

          Daryn,
          Can you elaborate on what you mean by "The use of isDstDir appears wrong and is duplication of most of the prior logic. "? I would definitely like to reuse the logic if it already exists. Could you give me a cleared idea as to what you mean?

          Show
          John George added a comment - If the javadoc is true, I think should be: if (!exists || !stat.isDirectory()) The purpose here is to ensure that destination is a directory and not a file. So previously when we checked for if (!stat.isDirectory()) all we were trying to do was to ensure that the destination is a directory. The issue is that stat could be null and so all that the check exist does is to ensure that stat is not null before we even check if it is directory or not! Hence, I believe the check if (exists && !stat.isDirectory()) is the right check. // if the destination is a directory, make target a child path, // else use the destination as-is - if (dst.exists && dst.stat.isDirectory()) { Here, (previously) the check for destination succeeded only if the 'dst' already existed. With the new code, it ensures that we detect that the destination is a directory even in cases where the destination does not exist. The logic does not change at all. Am I missing something obvious? Daryn, Can you elaborate on what you mean by "The use of isDstDir appears wrong and is duplication of most of the prior logic. "? I would definitely like to reuse the logic if it already exists. Could you give me a cleared idea as to what you mean?
          Hide
          Jitendra Nath Pandey added a comment -

          Please add comment regarding what isDstDir means.
          In the test, what is the reason behind changing the standard error stream?

          Show
          Jitendra Nath Pandey added a comment - Please add comment regarding what isDstDir means. In the test, what is the reason behind changing the standard error stream?
          Hide
          John George added a comment -

          attaching a patch based on Jitendra's comments.

          Show
          John George added a comment - attaching a patch based on Jitendra's comments.
          Hide
          Jitendra Nath Pandey added a comment -

          Many methods are updating the value of dst, isDstDir should also be updated in those places.

          Show
          Jitendra Nath Pandey added a comment - Many methods are updating the value of dst, isDstDir should also be updated in those places.
          Hide
          John George added a comment -

          Attaching another patch based on Jitendra's latest comments.

          Show
          John George added a comment - Attaching another patch based on Jitendra's latest comments.
          Hide
          Daryn Sharp added a comment -

          This is not the right approach. I had the problem fixed many months ago, but lost the patch. I'll have a version up shortly.

          Show
          Daryn Sharp added a comment - This is not the right approach. I had the problem fixed many months ago, but lost the patch. I'll have a version up shortly.
          Hide
          Daryn Sharp added a comment -

          Fixing the NPE exposes nasty underlying problems when getting directories. Will take me a bit longer to remember what I did. Patch will be up this weekend.

          Show
          Daryn Sharp added a comment - Fixing the NPE exposes nasty underlying problems when getting directories. Will take me a bit longer to remember what I did. Patch will be up this weekend.
          Hide
          John George added a comment -

          attaching a patch with a different approach. Using processPath to verify that a directory needs to be created or not... this avoiding the use of isDstDir

          Show
          John George added a comment - attaching a patch with a different approach. Using processPath to verify that a directory needs to be created or not... this avoiding the use of isDstDir
          Hide
          John George added a comment -

          This patch removes two unnecessary functions in CommandWithDestination as pointed by Jitendra. This is specific to -get and -copyToLocal commands. This fixes the NPE and the functionality of these two commands.

          Show
          John George added a comment - This patch removes two unnecessary functions in CommandWithDestination as pointed by Jitendra. This is specific to -get and -copyToLocal commands. This fixes the NPE and the functionality of these two commands.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12501434/HADOOP-7771.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/340//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501434/HADOOP-7771.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/340//console This message is automatically generated.
          Hide
          Jitendra Nath Pandey added a comment -

          The new approach taken in John's latest patch looks good to me. It seems the patch needs to be updated against the latest trunk, apart from that I am +1 for the patch.

          Daryn,
          Are you ok with the latest patch?

          Show
          Jitendra Nath Pandey added a comment - The new approach taken in John's latest patch looks good to me. It seems the patch needs to be updated against the latest trunk, apart from that I am +1 for the patch. Daryn, Are you ok with the latest patch?
          Hide
          Daryn Sharp added a comment -

          Fixes the NPE and also the recursion issues that cause paths to erroneously be copied into existing subdirs in the remote tree. It looks worse than it is, but most of it is moving common code into CommandWithDestination.

          HDFS tests need to be updated, but HDFS-2038 has them temporarily broken. I've extensively tested via cmdline.

          Show
          Daryn Sharp added a comment - Fixes the NPE and also the recursion issues that cause paths to erroneously be copied into existing subdirs in the remote tree. It looks worse than it is, but most of it is moving common code into CommandWithDestination . HDFS tests need to be updated, but HDFS-2038 has them temporarily broken. I've extensively tested via cmdline.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12501556/HADOOP-7771.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/342//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/342//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/342//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501556/HADOOP-7771.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/342//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/342//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/342//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Findbugs warnings are unrelated to this patch.

          Show
          Daryn Sharp added a comment - Findbugs warnings are unrelated to this patch.
          Hide
          Daryn Sharp added a comment -

          John's patch introduces at least these bugs:

          1. Copy multiple args to a non-existent destination is supposed to fail. The patch auto-creates the destination as a directory, thus shorting out the check in the parent class. Ex. "fs -cp f1 f2 f3 dir" should fail if dir doesn't exist, the patch changed it to succeed.
          2. If the source is a directory, and the destination doesn't exist, then the source is effectively supposed to be renamed as the destination. The auto-creates the destination as a directory and copies the source into the destination. Ex. "fs -get dir does-not-exist" should create "does-not-exist" with the contents of "dir". This patch creates "does-not-exist/dir".
          3. If a directory is copied to a file, the exception thrown is not using the PathData string.

          In effect, we are adding bugs, and trading a NPE with copying to the wrong destination, so it's a wash as far as testing is concerned.

          FWIW, early this year, I ported all this copy/move code as part of the FsShell redesign. I realized I introduced bugs way back in June or July and prepared a patch. I thought I lost the patch, so John started working this jira, but I dug it up the patch this weekend. I stripped out all the extra functionality that was added in that patch and posted the result.

          Show
          Daryn Sharp added a comment - John's patch introduces at least these bugs: Copy multiple args to a non-existent destination is supposed to fail. The patch auto-creates the destination as a directory, thus shorting out the check in the parent class. Ex. "fs -cp f1 f2 f3 dir" should fail if dir doesn't exist, the patch changed it to succeed. If the source is a directory, and the destination doesn't exist, then the source is effectively supposed to be renamed as the destination. The auto-creates the destination as a directory and copies the source into the destination. Ex. "fs -get dir does-not-exist" should create "does-not-exist" with the contents of "dir". This patch creates "does-not-exist/dir". If a directory is copied to a file, the exception thrown is not using the PathData string. In effect, we are adding bugs, and trading a NPE with copying to the wrong destination, so it's a wash as far as testing is concerned. FWIW, early this year, I ported all this copy/move code as part of the FsShell redesign. I realized I introduced bugs way back in June or July and prepared a patch. I thought I lost the patch, so John started working this jira, but I dug it up the patch this weekend. I stripped out all the extra functionality that was added in that patch and posted the result.
          Hide
          Daryn Sharp added a comment -

          Another bug that I fixed is "fs -get dir existing-dir" should create "existing-dir/dir". But if "existing-dir/dir" exists too, it creates "existing-dir/dir/dir".

          Show
          Daryn Sharp added a comment - Another bug that I fixed is "fs -get dir existing-dir" should create "existing-dir/dir". But if "existing-dir/dir" exists too, it creates "existing-dir/dir/dir".
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1291 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1291/)
          HADOOP-7771. FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1291 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1291/ ) HADOOP-7771 . FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1215 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1215/)
          HADOOP-7771. FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1215 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1215/ ) HADOOP-7771 . FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, John and Daryn!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, John and Daryn!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Commit #122 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/122/)
          svn merge -c 1195760 from trunk for HADOOP-7771.

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195762
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #122 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/122/ ) svn merge -c 1195760 from trunk for HADOOP-7771 . szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195762 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-0.23-Commit #121 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/121/)
          svn merge -c 1195760 from trunk for HADOOP-7771.

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195762
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #121 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/121/ ) svn merge -c 1195760 from trunk for HADOOP-7771 . szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195762 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1239 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1239/)
          HADOOP-7771. FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1239 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1239/ ) HADOOP-7771 . FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Commit #129 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/129/)
          svn merge -c 1195760 from trunk for HADOOP-7771.

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195762
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #129 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/129/ ) svn merge -c 1195760 from trunk for HADOOP-7771 . szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195762 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #850 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/850/)
          HADOOP-7771. FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #850 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/850/ ) HADOOP-7771 . FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #884 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/884/)
          HADOOP-7771. FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #884 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/884/ ) HADOOP-7771 . FsShell -copyToLocal, -get, etc. commands throw NPE if the destination directory does not exist. Contributed by John George and Daryn Sharp szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1195760 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Command.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/PathData.java
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Daryn,
          Looks this patch deletes the file if it fails to write completely.
          I remember the old behavior was that the partial file still presents. I don't see any discussion about this behavior change here. Is this discussed somewhere?

                tempFile = null;
              } finally {
                if (tempFile != null) {
                  tempFile.fs.delete(tempFile.path, false);
                }
              }
          
          Show
          Uma Maheswara Rao G added a comment - Hi Daryn, Looks this patch deletes the file if it fails to write completely. I remember the old behavior was that the partial file still presents. I don't see any discussion about this behavior change here. Is this discussed somewhere? tempFile = null ; } finally { if (tempFile != null ) { tempFile.fs.delete(tempFile.path, false ); } }
          Hide
          Daryn Sharp added a comment -

          The former behavior for some of the copy commands was to begin overwriting the target file. This jira commonized the copy code in FsShell since it was inconsistent and buggy. The existing code that I standardized on copies into a temp file, and then renames the temp file if the copy is successful. If the copy fails, the temp file is deleted.

          Do you feel it would be useful to retain the temp file?

          Show
          Daryn Sharp added a comment - The former behavior for some of the copy commands was to begin overwriting the target file. This jira commonized the copy code in FsShell since it was inconsistent and buggy. The existing code that I standardized on copies into a temp file, and then renames the temp file if the copy is successful. If the copy fails, the temp file is deleted. Do you feel it would be useful to retain the temp file?
          Hide
          Uma Maheswara Rao G added a comment -

          Do you feel it would be useful to retain the temp file?

          User will not about temp file paths right? Then there is not point in retaining temp files. Only concern is, In older version, if write fails also, other clients can read the file till where we have written the file successfully. Leter it may get recovered and closed.
          But now we are completely invalidating that finalized blocks on write failure.

          Also other point is, what if client abuptly shutdown? I think temp files will not be cleaned forever right?

          Regards,
          Uma

          Show
          Uma Maheswara Rao G added a comment - Do you feel it would be useful to retain the temp file? User will not about temp file paths right? Then there is not point in retaining temp files. Only concern is, In older version, if write fails also, other clients can read the file till where we have written the file successfully. Leter it may get recovered and closed. But now we are completely invalidating that finalized blocks on write failure. Also other point is, what if client abuptly shutdown? I think temp files will not be cleaned forever right? Regards, Uma
          Hide
          Uma Maheswara Rao G added a comment -

          Do you feel it would be useful to retain the temp file?

          User will not know about temp file paths right? Then there is not point in retaining temp files. Only concern is, In older version, if write fails also, other clients can read the file till where we have written the file successfully. Later it may get recovered and closed.
          But now we are completely invalidating that finalized blocks on write failure.

          Also other point is, what if client abuptly shutdown? I think temp files will not be cleaned forever right?

          Regards,
          Uma

          Show
          Uma Maheswara Rao G added a comment - Do you feel it would be useful to retain the temp file? User will not know about temp file paths right? Then there is not point in retaining temp files. Only concern is, In older version, if write fails also, other clients can read the file till where we have written the file successfully. Later it may get recovered and closed. But now we are completely invalidating that finalized blocks on write failure. Also other point is, what if client abuptly shutdown? I think temp files will not be cleaned forever right? Regards, Uma
          Hide
          Daryn Sharp added a comment -

          Yes, I wouldn't expect users to know or care about temp files. I don't recall which of all the copy variants already had this behavior but I standardized it across the copy commands. Distcp has similar behavior, although I think it copies into a temp subdir. Maybe it would make sense to mimic what it does?

          It's true that users could read the partial files, but I question if users would expect a partial file in the vast majority of use cases. Let's say a copy failed, the user didn't check the exit code possibly because 20x exit codes were unreliable, and then attempted to run a job. The job might "succeed" to process the partial data but produce erroneous results. Users might be baffled that re-running the job produces different results. A dev may go on a wild goose chase and resolve as "works for me" when they can't reproduce.

          If the client is interrupted by a signal, then it often used to leave a temp file because the signal wasn't breaking out correctly and affecting the shutdown hooks. I recently fixed that. I've been meaning to investigation/reproduce/fix a few user reports that temp files are still sometimes left around. If you do a hard kill (-9), then of course there's nothing that can be done unless the NN has a notion of this "this file should be deleted unless I explicitly close it".

          What are your thoughts?

          Show
          Daryn Sharp added a comment - Yes, I wouldn't expect users to know or care about temp files. I don't recall which of all the copy variants already had this behavior but I standardized it across the copy commands. Distcp has similar behavior, although I think it copies into a temp subdir. Maybe it would make sense to mimic what it does? It's true that users could read the partial files, but I question if users would expect a partial file in the vast majority of use cases. Let's say a copy failed, the user didn't check the exit code possibly because 20x exit codes were unreliable, and then attempted to run a job. The job might "succeed" to process the partial data but produce erroneous results. Users might be baffled that re-running the job produces different results. A dev may go on a wild goose chase and resolve as "works for me" when they can't reproduce. If the client is interrupted by a signal, then it often used to leave a temp file because the signal wasn't breaking out correctly and affecting the shutdown hooks. I recently fixed that. I've been meaning to investigation/reproduce/fix a few user reports that temp files are still sometimes left around. If you do a hard kill (-9), then of course there's nothing that can be done unless the NN has a notion of this "this file should be deleted unless I explicitly close it". What are your thoughts?
          Hide
          Kousuke Saruta added a comment -

          In this jira, the name of the temp file has the suffix ".COPYING" but I think it can cause a bad effect when we run mapreduce.
          When we run mapreduce, temp files should be ignored so we need to consider the name of the temp file.
          If you have no objections, I will create a jira about that.

          Show
          Kousuke Saruta added a comment - In this jira, the name of the temp file has the suffix ". COPYING " but I think it can cause a bad effect when we run mapreduce. When we run mapreduce, temp files should be ignored so we need to consider the name of the temp file. If you have no objections, I will create a jira about that.
          Hide
          Daryn Sharp added a comment -

          What is your alternate idea for handling of temp files?

          Show
          Daryn Sharp added a comment - What is your alternate idea for handling of temp files?
          Hide
          Kousuke Saruta added a comment -

          Naming the temp file starting with "." or "_" so that mapreduce can ignore it.

          Show
          Kousuke Saruta added a comment - Naming the temp file starting with "." or "_" so that mapreduce can ignore it.

            People

            • Assignee:
              John George
              Reporter:
              John George
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development