Hadoop Common
  1. Hadoop Common
  2. HADOOP-6558

archive does not work with distcp -update

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The following distcp command works.

      hadoop distcp -Dmapred.job.queue.name=q har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101 t101_distcp
      

      However, it does not work for -update.

      -bash-3.1$ hadoop distcp -Dmapred.job.queue.name=q -update har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101 t101_distcp
      10/01/29 20:06:53 INFO tools.DistCp: srcPaths=[har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101]
      10/01/29 20:06:53 INFO tools.DistCp: destPath=t101
      java.lang.IllegalArgumentException: Wrong FS: har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101/text-00000000, expected: hdfs://nn_hostname
              at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
              at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:463)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:46)
              at org.apache.hadoop.fs.FilterFileSystem.getFileChecksum(FilterFileSystem.java:250)
              at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1204)
              at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1084)
              ...
      
      1. c6558_20100216b_y0.20.patch
        1 kB
        Tsz Wo Nicholas Sze
      2. c6558_20100216b.patch
        1 kB
        Tsz Wo Nicholas Sze
      3. c6558_20100216.patch
        0.6 kB
        Tsz Wo Nicholas Sze

        Activity

        Hide
        Mahadev konar added a comment -

        Looks like this might be a problem with distcp whcih requires the src and dest to be the same filesystem?

        Show
        Mahadev konar added a comment - Looks like this might be a problem with distcp whcih requires the src and dest to be the same filesystem?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        No. Distcp works if src and dest are in different filesystem schemes.

        I briefly checked the codes. It seems that harfs.listPath("har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101") returns file statuses with hdfs://, e.g. hdfs://hostname:8020/user/tsz/t101.har/t101/file1.

        Show
        Tsz Wo Nicholas Sze added a comment - No. Distcp works if src and dest are in different filesystem schemes. I briefly checked the codes. It seems that harfs.listPath("har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101") returns file statuses with hdfs://, e.g. hdfs://hostname:8020/user/tsz/t101.har/t101/file1.
        Hide
        Mahadev konar added a comment -

        good catch..... ill fix that...

        Show
        Mahadev konar added a comment - good catch..... ill fix that...
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Took a closer look: HarFileSystem extends FilterFileSystem and it uses the underlying file system to get file checksum. That's why we got Wrong FS since HarFileSystem passes a har:// path to the underlying fs.getFileChecksum(..). In our case, the underlying fs is hdfs.

        Show
        Tsz Wo Nicholas Sze added a comment - Took a closer look: HarFileSystem extends FilterFileSystem and it uses the underlying file system to get file checksum. That's why we got Wrong FS since HarFileSystem passes a har:// path to the underlying fs.getFileChecksum(..). In our case, the underlying fs is hdfs.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Moved this from MapReduce to Common since it is a fs issue.

        Show
        Tsz Wo Nicholas Sze added a comment - Moved this from MapReduce to Common since it is a fs issue.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        c6558_20100216.patch: returns null in HarFileSystem.getFileChecksum(..)

        Show
        Tsz Wo Nicholas Sze added a comment - c6558_20100216.patch: returns null in HarFileSystem.getFileChecksum(..)
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I will add a test once HADOOP-6560 is committed.

        Show
        Tsz Wo Nicholas Sze added a comment - I will add a test once HADOOP-6560 is committed.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        c6558_20100216b.patch: added a test.

        Show
        Tsz Wo Nicholas Sze added a comment - c6558_20100216b.patch: added a test.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12436084/c6558_20100216b.patch
        against trunk revision 910741.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436084/c6558_20100216b.patch against trunk revision 910741. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Manually tested the patch with "distcp -update" as shown in the description. It worked fine.

        -bash-3.1$ $H distcp ${Q} -update ${HAR_FULL}/${DIR} ${DIR}
        10/02/17 21:29:36 INFO tools.DistCp: srcPaths=[har://hdfs-nn:8020/user/tsz/t20.har/t20]
        10/02/17 21:29:36 INFO tools.DistCp: destPath=t20
        10/02/17 21:29:37 INFO tools.DistCp: sourcePathsCount=21
        10/02/17 21:29:37 INFO tools.DistCp: filesToCopyCount=0
        10/02/17 21:29:37 INFO tools.DistCp: bytesToCopyCount=0.0
        
        Show
        Tsz Wo Nicholas Sze added a comment - Manually tested the patch with "distcp -update" as shown in the description. It worked fine. -bash-3.1$ $H distcp ${Q} -update ${HAR_FULL}/${DIR} ${DIR} 10/02/17 21:29:36 INFO tools.DistCp: srcPaths=[har://hdfs-nn:8020/user/tsz/t20.har/t20] 10/02/17 21:29:36 INFO tools.DistCp: destPath=t20 10/02/17 21:29:37 INFO tools.DistCp: sourcePathsCount=21 10/02/17 21:29:37 INFO tools.DistCp: filesToCopyCount=0 10/02/17 21:29:37 INFO tools.DistCp: bytesToCopyCount=0.0
        Hide
        Mahadev konar added a comment -

        +1 ... the patch looks good...

        Show
        Mahadev konar added a comment - +1 ... the patch looks good...
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Thanks Mahadev for reviewing it.

        I have committed this.

        Show
        Tsz Wo Nicholas Sze added a comment - Thanks Mahadev for reviewing it. I have committed this.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        c6558_20100216b_y0.20.patch: for y0.20

        Show
        Tsz Wo Nicholas Sze added a comment - c6558_20100216b_y0.20.patch: for y0.20
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #175 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/175/)
        . Return null in HarFileSystem.getFileChecksum(..) since no checksum algorithm is implemented.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #175 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/175/ ) . Return null in HarFileSystem.getFileChecksum(..) since no checksum algorithm is implemented.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #255 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/255/)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #255 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/255/ )

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Tsz Wo Nicholas Sze
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development