[HIVE-1118] Add hive.merge.size.per.task to HiveConf - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.6.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

Currently, by default, we get one reducer for each 1GB of input data.
It's also true for the conditional merge job that will run if the average file size is smaller than a threshold.

This actually makes those job very slow, because each reducer needs to consume 1GB of data.

Alternatively, we can just use that threshold to determine the number of reducers per job (or introduce a new parameter).
Let's say the threshold is 1MB, then we only start the the merge job if the average file size is less than 1MB, and the eventual result file size will be around 1MB (or another small number).

This will remove the extreme cases where we have thousands of empty files, but still make normal jobs fast enough.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-1118.1.patch
01/Feb/10 22:02
2 kB
Zheng Shao
HIVE-1118.2.patch
01/Feb/10 22:09
2 kB
Zheng Shao
HIVE-1118.3.patch
01/Feb/10 22:17
1 kB
Zheng Shao

Activity

People

Assignee:: Zheng Shao

Reporter:: Zheng Shao

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 29/Jan/10 05:33

Updated:: 17/Dec/11 00:03

Resolved:: 02/Feb/10 01:14