Description
When performing a massive distcp through hftp, we saw many tasks fail with
2010-04-06 17:56:43,005 INFO org.apache.hadoop.tools.DistCp: FAIL 2010/0/part-00032 : java.io.IOException: File size not matched: copied 193855488 bytes (184.9m) to tmpfile (=hdfs://omehost.com:8020/somepath/part-00032)
but expected 1710327403 bytes (1.6g) from hftp://someotherhost/somepath/part-00032
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:435)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:543)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:310)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
This means that read itself didn't fail but the resulted file was somehow smaller.