Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12638

Hive should not create empty files in partitions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: File Formats
    • Labels:
      None

      Description

      Currently Hive creates empty files for buckets with no rows in a directory. I believe this was originally because the SMB and bucket join require files to be present to get InputSplits. There are customers where this behavior leads the creation of more 200,000 empty ORC files per an hour on a cluster (with peaks of more than 725,000 per an hour). We've also seen instances where a single DataNode is involved in 5600 of these empty ORC files within a 2 minute period. This causes significant stress on HDFS at both the NameNode and DataNode and is completely unnecessary.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              omalley Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: