Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17257

Hive should merge empty files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • None
    • None

    Description

      Currently if merging file option is turned on and the dest dir contains large number of empty files, Hive will not trigger merge task:

        private long getMergeSize(FileSystem inpFs, Path dirPath, long avgSize) {
          AverageSize averageSize = getAverageSize(inpFs, dirPath);
          if (averageSize.getTotalSize() <= 0) {
            return -1;
          }
      
          if (averageSize.getNumFiles() <= 1) {
            return -1;
          }
      
          if (averageSize.getTotalSize()/averageSize.getNumFiles() < avgSize) {
            return averageSize.getTotalSize();
          }
          return -1;
        }
      

      This logic doesn't seem right as the it seems better to combine these empty files into one.

      Attachments

        1. HIVE-17257.0.patch
          0.7 kB
          Chao Sun
        2. HIVE-17257.1.patch
          2 kB
          Chao Sun
        3. HIVE-17257.2.patch
          0.7 kB
          Chao Sun
        4. HIVE-17257.3.patch
          3 kB
          Chao Sun

        Activity

          People

            csun Chao Sun
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: