Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-643

distcp -pugp error message is not clear when chgrp fail.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • distcp
    • None

    Description

      To achieve rsync-like behavior between a local directory and an HDFS instance, a pseudo-distributed MapReduce cluster was started, connected to a fully distributed HDFS instance. An initial distcp from HDFS down to the local fileystem succeeded. The following day, another distcp was run with:

      $ bin/hadoop distcp -pugp -update hdfs://nn:7276/data/raw file:///data/raw

      It failed; its output is below:

      09/06/07 13:14:51 INFO tools.DistCp: srcPaths=[hdfs://nn:7276/data/raw]
      09/06/07 13:14:51 INFO tools.DistCp: destPath=file:/data/raw
      09/06/07 13:14:55 INFO tools.DistCp: srcCount=10955
      09/06/07 13:14:56 INFO mapred.JobClient: Running job: job_200906071310_0001
      09/06/07 13:14:57 INFO mapred.JobClient: map 0% reduce 0%
      09/06/07 13:15:24 INFO mapred.JobClient: map 1% reduce 0%
      09/06/07 13:17:34 INFO mapred.JobClient: map 2% reduce 0%
      09/06/07 13:20:04 INFO mapred.JobClient: map 3% reduce 0%
      09/06/07 13:20:49 INFO mapred.JobClient: map 4% reduce 0%
      09/06/07 13:21:44 INFO mapred.JobClient: map 5% reduce 0%
      09/06/07 13:22:33 INFO mapred.JobClient: map 6% reduce 0%
      09/06/07 13:25:14 INFO mapred.JobClient: map 7% reduce 0%
      09/06/07 13:27:14 INFO mapred.JobClient: map 8% reduce 0%
      09/06/07 13:33:34 INFO mapred.JobClient: map 9% reduce 0%
      09/06/07 13:37:30 INFO mapred.JobClient: map 10% reduce 0%
      09/06/07 13:40:05 INFO mapred.JobClient: map 11% reduce 0%
      09/06/07 13:44:55 INFO mapred.JobClient: map 12% reduce 0%
      09/06/07 13:48:55 INFO mapred.JobClient: map 13% reduce 0%
      09/06/07 13:54:41 INFO mapred.JobClient: map 14% reduce 0%
      09/06/07 13:58:30 INFO mapred.JobClient: map 15% reduce 0%
      09/06/07 14:00:46 INFO mapred.JobClient: map 16% reduce 0%
      09/06/07 14:01:36 INFO mapred.JobClient: map 17% reduce 0%
      09/06/07 14:04:12 INFO mapred.JobClient: map 13% reduce 0%
      09/06/07 14:04:12 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_0, Status : FAILED
      java.io.IOException: Copied: 0 Skipped: 264 Failed: 39
      at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:542) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

      09/06/07 14:04:19 INFO mapred.JobClient: Task Id : attempt_200906071310_0001_m_000006_1, Status : FAILED
      java.io.FileNotFoundException: File does not exist: hdfs://nn:7276/tmp/hadoop/mapred/system/distcp_m8n2e/_distcp_src_files
      at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:412)
      at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:684)
      at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
      at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
      at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
      at org.apache.hadoop.tools.DistCp$CopyInputFormat.getRecordReader(DistCp.java:272)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

      (several more tasks fail for the same reason with FileNotFoundException)

      With failures, global counters are inaccurate; consider running with -i
      Copy failed: java.io.IOException: Job failed!
      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
      at org.apache.hadoop.tools.DistCp.copy(DistCp.java:619)
      at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)

      This distcp update operation does succeed without -pugp.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kimballa Aaron Kimball
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: