Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1332

Archiving partitions

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.6.0
    • Metastore
    • None
    • Reviewed

    Description

      Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets.

      One way to drastically reduce the number of files is to use hadoop archives:
      http://hadoop.apache.org/common/docs/current/hadoop_archives.html

      This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION <spec> that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived.

      Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes:

      https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix)
      https://issues.apache.org/jira/browse/HADOOP-6645
      https://issues.apache.org/jira/browse/MAPREDUCE-1585

      Attachments

        1. HIVE-1332.1.patch
          65 kB
          Paul Yang
        2. HIVE-1332.2.patch
          79 kB
          Paul Yang
        3. HIVE-1332.3.patch
          83 kB
          Paul Yang
        4. HIVE-1332.4.patch
          96 kB
          Paul Yang
        5. HIVE-1332.5.patch
          97 kB
          Paul Yang
        6. HIVE-1332.6.patch
          99 kB
          Paul Yang

        Issue Links

          Activity

            People

              pauly Paul Yang
              pauly Paul Yang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: