Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
In FileUtils.java the following code does not get executed even when src directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because
srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We should use srcFS.getContentSummary(src).getLength() instead.
/* Run distcp if source file/dir is too big */ if (srcFS.getUri().getScheme().equals("hdfs") && srcFS.getFileStatus(src).getLen() > conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) { LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + ")"); LOG.info("Launch distributed copy (distcp) job."); HiveConfUtil.updateJobCredentialProviders(conf); copied = shims.runDistCp(src, dst, conf); if (copied && deleteSource) { srcFS.delete(src, true); } }