Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-439

merge small files after a map-only job

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3.0
    • 0.4.0
    • Query Processor
    • None
    • Reviewed
    • HIVE-439. Merge small files after a map-only job. (Namit Jain via zshao)

    Description

      There are cases when the input to a Hive job are thousands of small files. In this case, there is a mapper for each file. Most of the overhead for spawning all these mappers can be avoided if these small files are combined into fewer larger files.

      The problem can also be addressed by having a mapper span multiple blocks as in:

      https://issues.apache.org/jira/browse/HIVE-74

      Bit, it also makes sense in HIVE to merge files whenever possible.

      <property>
        <name>hive.merge.mapfiles</name>
        <value>true</value>
        <description>Merge small files at the end of the job</description>
      </property>
      
      <property>
        <name>hive.merge.size.per.task</name>
        <value>256000000</value>
        <description>Size of merged files at the end of the job</description>
      </property>
      

      Attachments

        1. hive.439.1.patch
          1.50 MB
          Namit Jain
        2. hive.439.2.patch
          1.45 MB
          Namit Jain
        3. hive.439.3.patch
          1.45 MB
          Namit Jain
        4. hive.439.4.patch
          1.45 MB
          Namit Jain
        5. hive.439.5.patch
          1.45 MB
          Namit Jain

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            namit Namit Jain Assign to me
            namit Namit Jain
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment