Hive
  1. Hive
  2. HIVE-2089

Add a new input format to be able to combine multiple .gz text files

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      For files that is not splittable, CombineHiveInputFormat won't help. This jira is to add a new inputformat to support this feature. This is very useful for partitions with tens of thousands of .gz files.

      1. HIVE-2089.1.patch
        45 kB
        He Yongqiang

        Issue Links

          Activity

          Hide
          Steven Wong added a comment -

          I think Yongquiang was referring to MAPREDUCE-1597 in his comment. If you have MAPREDUCE-1597 and set hive.hadoop.supports.splittable.combineinputformat=true, you don't need HIVE-2089.

          Show
          Steven Wong added a comment - I think Yongquiang was referring to MAPREDUCE-1597 in his comment. If you have MAPREDUCE-1597 and set hive.hadoop.supports.splittable.combineinputformat=true, you don't need HIVE-2089 .
          Hide
          He Yongqiang added a comment -

          Actually just found that the recent hadoop's combineFileInputFormat support not splittable files as input. So it won't be a problem for .gz files if the hadoop has the feature checked in.

          Another use case for it is Hive's SymlinkInputFormat, which may point to too many .gz files.

          Show
          He Yongqiang added a comment - Actually just found that the recent hadoop's combineFileInputFormat support not splittable files as input. So it won't be a problem for .gz files if the hadoop has the feature checked in. Another use case for it is Hive's SymlinkInputFormat, which may point to too many .gz files.

            People

            • Assignee:
              Unassigned
              Reporter:
              He Yongqiang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development