[KYLIN-702] When Kylin create the flat hive table, it generates large number of small files in HDFS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: v0.7.1, v1.0, v1.1, v1.1.1
Fix Version/s: v1.2, v1.4.0
Component/s: Others
Labels:
None

Description

When I build a cube, I noticed that when build the dictionary and calculate the cube, there are a large number of mappers be started (more than 10,000); With the log I noticed many mappers has 0 or much less records to process, this confused me;

Then I checked the storage location of the flat table, found there are many files; I did a count and found it is the same number as the mappers;

Too many mappers will cause much overhead, and downgrade the cluster's performance; Kylin should ask Hive to merge those small files during creating the flat table step.

In my hadoop cluster, the hive.merge.mapredfiles was set to false (default value); After changing it to true for Kylin's job, the intermediate table's file number was reduced to 4, each be up to 256M, looks good; Check hive configuration at: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

Attachments

Activity

People

Assignee:: Shao Feng Shi

Reporter:: Shao Feng Shi

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Apr/15 14:21

Updated:: 22/Dec/15 09:20

Resolved:: 01/Oct/15 02:44