Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14864

Distcp is not called from MoveTask when src is a directory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • None
    • None

    Description

      In FileUtils.java the following code does not get executed even when src directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because
      srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We should use srcFS.getContentSummary(src).getLength() instead.

          /* Run distcp if source file/dir is too big */
          if (srcFS.getUri().getScheme().equals("hdfs") &&
              srcFS.getFileStatus(src).getLen() > conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) {
            LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + ")");
            LOG.info("Launch distributed copy (distcp) job.");
            HiveConfUtil.updateJobCredentialProviders(conf);
            copied = shims.runDistCp(src, dst, conf);
            if (copied && deleteSource) {
              srcFS.delete(src, true);
            }
          }
      

      Attachments

        1. HIVE-14864.patch
          4 kB
          Sahil Takiar
        2. HIVE-14864.4.patch
          11 kB
          Sahil Takiar
        3. HIVE-14864.3.patch
          4 kB
          Sahil Takiar
        4. HIVE-14864.2.patch
          3 kB
          Sahil Takiar
        5. HIVE-14864.1.patch
          3 kB
          Sahil Takiar

        Activity

          People

            stakiar Sahil Takiar
            vihangk1 Vihang Karajgaonkar
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: