Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17256

DistCp -update option will be invalid when distcp files from hdfs to S3

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • tools/distcp
    • None

    Description

      We use distcp with -update option to copy a dir from hdfs to S3. When we run distcp job once more, it will overwrite S3 dir directly, rather than skip the same files.
       
      Test Case:
      Run twice the following cmd,  the modify time of S3 files will be modified every time.
      hadoop distcp -update /test/ s3a://${s3_buckect}/test/

       

      Check code in CopyMapper.java and S3AFileSystem.java 

      (1) For the first time, distcp job will create files in S3, but blockSize is unused!

       

      (2) For the second time, the distcp job will compare fileSize and blockSize between hdfs file and S3 file

       

      (3) blockSize is unused, when get blockSize of S3 file, it return a default value.

      In S3AFileSystem.java, we find that the default value of fs.s3a.block.size is 32 * 1024 * 1024.

       


       

      The blockSize of HDFS seems invalid in Object Store, like S3. So I think there's no need to compare blockSize when distcp with -update option.

      Attachments

        1. image-2020-09-10-19-48-38-574.png
          92 kB
          liuxiaolong
        2. image-2020-09-10-17-52-32-290.png
          16 kB
          liuxiaolong
        3. image-2020-09-10-17-47-01-653.png
          58 kB
          liuxiaolong
        4. image-2020-09-10-17-45-16-998.png
          101 kB
          liuxiaolong
        5. image-2020-09-10-17-33-50-505.png
          221 kB
          liuxiaolong
        6. image-2020-09-10-17-25-46-354.png
          58 kB
          liuxiaolong

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lxl liuxiaolong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: