Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: distcp
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      1. distcp -update launches job when there is at least one dir in source paths to be copied, even though there is nothing to copy.

      HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch job. And HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the issue of empty directories not getting copied to destination. With -update, dirCount is incremented without checking if that dir already exists at the destination. So distcp job is launched because of dirCount > 0 even though there is nothing to copy. Incrementing dirCount can be skipped if that dir already exists at the destination in case of -update.

      2. distcp doesn't skip copying file when we do -update on single file if the destfile already exists.

      When we do

      hadoop distcp -update srcfilename destfilename

      it seems to be comparing checksums of srcfilename and destfilename/srcfilename and so skip is not done. It should compare checksums of srcfilename and destfilename.

      See also MAPREDUCE-644.

      1. d_dirCount_648.patch
        2 kB
        Ravi Gummadi
      2. d_dirCount648.patch
        2 kB
        Ravi Gummadi
      3. d_dirCount648.v1.patch
        2 kB
        Ravi Gummadi
      4. d_648_644.patch
        8 kB
        Ravi Gummadi

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          You are right. We could improve this.

          Show
          Tsz Wo Nicholas Sze added a comment - You are right. We could improve this.
          Hide
          Ravi Gummadi added a comment -

          Attaching patch that makes setup() to increment dirCount only if the dir doesn't exist at the destination in case of -update. Also wouldn't write the name of the dir to _distcp_src_files if the dir need not be created at destination.

          Show
          Ravi Gummadi added a comment - Attaching patch that makes setup() to increment dirCount only if the dir doesn't exist at the destination in case of -update. Also wouldn't write the name of the dir to _distcp_src_files if the dir need not be created at destination.
          Hide
          Ravi Gummadi added a comment -

          May be we still need to write the name of the dir to _distcp_src_files even if the dir need not be created at the destination because it may be needed for finalize() to update permissions ?

          Show
          Ravi Gummadi added a comment - May be we still need to write the name of the dir to _distcp_src_files even if the dir need not be created at the destination because it may be needed for finalize() to update permissions ?
          Hide
          Ravi Gummadi added a comment -

          Attaching patch that applies after MAPREDUCE-649, MAPREDUCE-654, MAPREDUCE-645, MAPREDUCE-664, MAPREDUCE-661 and MAPREDUCE-650 are committed.

          Show
          Ravi Gummadi added a comment - Attaching patch that applies after MAPREDUCE-649 , MAPREDUCE-654 , MAPREDUCE-645 , MAPREDUCE-664 , MAPREDUCE-661 and MAPREDUCE-650 are committed.
          Hide
          Ravi Gummadi added a comment -

          Testcase of MAPREDUCE-644 works only with the fix of MAPREDUCE-648.

          Show
          Ravi Gummadi added a comment - Testcase of MAPREDUCE-644 works only with the fix of MAPREDUCE-648 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Ravi, the patch cannot be applied to trunk.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Ravi, the patch cannot be applied to trunk.
          Hide
          Ravi Gummadi added a comment -

          Attaching patch that applies to trunk.

          Show
          Ravi Gummadi added a comment - Attaching patch that applies to trunk.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Patch looks mostly good. Some comments below:

          • Why throwing IOException in the following?
            +    if (destFileSys.exists(dst)) {
            +      if (!destFileSys.getFileStatus(dst).isDir()) {
            +        throw new IOException("Failed to mkdirs: " + dst+" is a file.");
            +      }
            +      return true;
            +    }
            

            Also, destFileSys.exists(dst) can be omitted for saving an RPC. We may have try-catch on destFileSys.getFileStatus(dst). If dst does not exist, a FNFE will be caught.

          • Could you rename skipfile to skippath since it includes dir after the patch?
          Show
          Tsz Wo Nicholas Sze added a comment - Patch looks mostly good. Some comments below: Why throwing IOException in the following? + if (destFileSys.exists(dst)) { + if (!destFileSys.getFileStatus(dst).isDir()) { + throw new IOException( "Failed to mkdirs: " + dst+ " is a file." ); + } + return true ; + } Also, destFileSys.exists(dst) can be omitted for saving an RPC. We may have try-catch on destFileSys.getFileStatus(dst). If dst does not exist, a FNFE will be caught. Could you rename skipfile to skippath since it includes dir after the patch?
          Hide
          Ravi Gummadi added a comment -

          In dirExists() method, if dst exists, it should be a dir. If dst is a file, throwing IOException.

          Does the following look better ?

          FileStatus status = null;
          try

          { status = destFileSys.getFileStatus(dst); }

          catch (FileNotFoundException e)

          { return false; }

          if (!status.isDir())

          { throw new IOException("Not a dir: " + dst+" is a file."); }

          return true;

          Show
          Ravi Gummadi added a comment - In dirExists() method, if dst exists, it should be a dir. If dst is a file, throwing IOException. Does the following look better ? FileStatus status = null; try { status = destFileSys.getFileStatus(dst); } catch (FileNotFoundException e) { return false; } if (!status.isDir()) { throw new IOException("Not a dir: " + dst+" is a file."); } return true;
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Does the following look better ?
          Yes. We may use org.apache.hadoop.fs.FileAlreadyExistsException for IOException("Not a dir: " + dst+" is a file.").

          Also, please add a unit test.

          Show
          Tsz Wo Nicholas Sze added a comment - > Does the following look better ? Yes. We may use org.apache.hadoop.fs.FileAlreadyExistsException for IOException("Not a dir: " + dst+" is a file."). Also, please add a unit test.
          Hide
          Ravi Gummadi added a comment -

          Attaching patch with the suggested changes. This patch also includes the code changes needed for MAPREDUCE-644. Testcase validates both MAPREDUCE-644 and MAPREDUCE-648.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching patch with the suggested changes. This patch also includes the code changes needed for MAPREDUCE-644 . Testcase validates both MAPREDUCE-644 and MAPREDUCE-648 . Please review and provide your comments.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419610/d_648_644.patch
          against trunk revision 815249.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/80/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/80/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/80/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/80/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419610/d_648_644.patch against trunk revision 815249. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/80/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/80/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/80/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/80/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 the patch is perfect.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 the patch is perfect.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Included the description of MAPREDUCE-644.

          Show
          Tsz Wo Nicholas Sze added a comment - Included the description of MAPREDUCE-644 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Ravi!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Ravi!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #40 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/40/)
          . Fix two distcp bugs: (1) it should not launch a job if all src paths are directories, and (2) it does not skip copying when updating a single file. Contributed by Ravi Gummadi

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #40 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/40/ ) . Fix two distcp bugs: (1) it should not launch a job if all src paths are directories, and (2) it does not skip copying when updating a single file. Contributed by Ravi Gummadi
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #84 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/84/)
          . Fix two distcp bugs: (1) it should not launch a job if all src paths are directories, and (2) it does not skip copying when updating a single file. Contributed by Ravi Gummadi

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #84 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/84/ ) . Fix two distcp bugs: (1) it should not launch a job if all src paths are directories, and (2) it does not skip copying when updating a single file. Contributed by Ravi Gummadi

            People

            • Assignee:
              Ravi Gummadi
              Reporter:
              Ravi Gummadi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development