Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.16.0
-
None
-
Nightly build: http://hadoopqa.yst.corp.yahoo.com:8080/hudson/job/Hadoop-LinuxTest/770/
With patches forHADOOP-2095andHADOOP-2119.
Description
We used distcp to copy ~100 TB of data across two clusters ~1400 nodes each.
Command used (it was run on the src cluster):
hadoop distcp -log /logdir/logfile hdfs://src-namenode:8600//src-dir-1 hdfs://src-namenode:8600//src-dir-2 ... hdfs://src-namenode:8600//src-dir-n hdfs://tgt-namenode:8600//dst-dir
Distcp completed without errors, but when we checked the file sizes on the src and tgt clusters, we noticed differences in file sizes for 9 files (~6 GB).
src-file-1 666762714 bytes -> tgt-file-1 134217728 bytes
src-file-2 673791814 bytes -> tgt-file-2 536870912 bytes
src-file-3 692172075 bytes -> tgt-file-3 0 bytes
All target files are truncated at block boundaries (some have 0 size).
I looked at the log files, and noticed a few things:
1. There are 31059 log files (same as the number of Maps the job had).
2. 246 of the log files are non-empty.
3. All non-empty log files are of the form:
SKIP: hdfs://src-namenode/src-dir-a/src-file-x
SKIP: hdfs://src-namenode/src-dir-b/src-file-y
SKIP: hdfs://src-namenode/src-dir-c/src-file-z
4. All 9 files which were truncated were included in the log files as skipped files.
5. All 9 files were the last entry in their respective log files.
e.g.
Non-empty logfile 1:
SKIP: hdfs://src-namenode/src-dir-a/src-file-x
SKIP: hdfs://src-namenode/src-dir-b/src-file-y
SKIP: hdfs://src-namenode/src-dir-c/src-file-z <-- Truncated file
Non_empty logfile 2:
SKIP: hdfs://src-namenode/src-dir-p/src-file-m
SKIP: hdfs://src-namenode/src-dir-q/src-file-n <-- Truncated file
Attachments
Attachments
Issue Links
- depends upon
-
HADOOP-2754 Path filter for Local file system list .crc files
- Closed
- incorporates
-
HADOOP-2807 distcp creating a file instead of a target directory (with single file source dir)
- Closed