Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2049

distcp does not fail if source directory has files with missing blocks

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.15.0
    • Component/s: util
    • Labels:
      None
    • Environment:

      Nightly build: Oct 11, 2007.

      Description

      I copied a directory using distcp (to another directory on the same file system).

      There were 9 data blocks missing in the files in the source directory, which caused distcp to print messages like the following:

      ...
      07/10/13 00:09:16 INFO mapred.JobClient: map 1% reduce 0%
      07/10/13 00:09:16 INFO mapred.JobClient: Task Id : task_200710120717_0081_m_000020_0, Status : FAILED
      java.io.IOException: Could not obtain block: blk_6787282547149034655 file=/srcdir/file1
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1136)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:988)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1094)
      at java.io.DataInputStream.read(DataInputStream.java:83)
      at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.copy(CopyFiles.java:289)
      at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:348)
      at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:216)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1753)
      ...

      The corresponding tasks failed, but the retries were successful (all files with missing blocks in the source directory were copied as empty files in the target directory).

      I think that distcp should fail if it cannot successfully copy all the files (at least when no command-line options are given).

      This is critical for us as we intend to use distcp to copy databases from one dfs to another, and if silent failures can happen then we would have to monitor each distcp manually to ensure that it succeeded.

        Attachments

          Activity

            People

            • Assignee:
              cdouglas Christopher Douglas
              Reporter:
              mabasrai Murtaza A. Basrai
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: