Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13024

Distcp with -delete feature on raw data not implemented

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: tools/distcp
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When doing distcp of raw data using -delete feature, following bug appears.

      [root@xxx bin]# hadoop distcp -delete -update /.reserved/raw/tmp/a /.reserved/raw/tmp/b
      16/04/14 02:54:01 ERROR tools.DistCp: Exception encountered
      java.io.IOException: DistCp failure: Job job_xxx has failed: Job commit failed: org.apache.hadoop.tools.CopyListing$InvalidInputException: The source path 'hdfs://nn/.reserved/raw/tmp/b' starts with /.reserved/raw but the target path 'hdfs://nn/NONE' does not. Either all or none of the paths must have this prefix.
              at org.apache.hadoop.tools.SimpleCopyListing.validatePaths(SimpleCopyListing.java:141)
              at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:85)
              at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
              at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
              at org.apache.hadoop.tools.mapred.CopyCommitter.deleteMissing(CopyCommitter.java:244)
              at org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:94)
              at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
              at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      
      
              at org.apache.hadoop.tools.DistCp.execute(DistCp.java:187)
              at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
              at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
      

      The issue is not with the distributed copy, the issue is when it tries to delete things in the target that no longer exist in the source, it revalidates to make sure NONE is in the /.reserved/raw domain.

        Attachments

        1. HADOOP-13024.patch.10
          9 kB
          Mavin Martin
        2. HADOOP-13024.patch.9
          8 kB
          Mavin Martin
        3. HADOOP-13024.patch.8
          8 kB
          Mavin Martin
        4. HADOOP-13024.patch.7
          8 kB
          Mavin Martin
        5. HADOOP-13024.patch.6
          6 kB
          Mavin Martin
        6. HADOOP-13024.patch.5
          3 kB
          Mavin Martin
        7. HADOOP-13024.patch.4
          3 kB
          Mavin Martin
        8. HADOOP-13024.patch.3
          3 kB
          Mavin Martin
        9. HADOOP-13024.patch
          3 kB
          Mavin Martin
        10. HADOOP-13024.patch
          3 kB
          Mavin Martin

          Activity

            People

            • Assignee:
              mavinmartin@gmail.com Mavin Martin
              Reporter:
              mavinmartin@gmail.com Mavin Martin
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: