[HIVE-15796] HoS: poor reducer parallelism when operator stats are not accurate - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.3.0
Component/s: Statistics
Labels:
- TODOC2.2

Description

In HoS we use currently use operator stats to determine reducer parallelism. However, it is often the case that operator stats are not accurate, especially if column stats are not available. This sometimes will generate extremely poor reducer parallelism, and cause HoS query to run forever.

This JIRA tries to offer an alternative way to compute reducer parallelism, similar to how MR does. Here's the approach we are suggesting:
1. when computing the parallelism for a MapWork, use stats associated with the TableScan operator;
2. when computing the parallelism for a ReduceWork, use the maximum parallelism from all its parents.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-15796.1.patch
08/Feb/17 07:47
50 kB
Chao Sun
HIVE-15796.2.patch
08/Feb/17 18:12
52 kB
Chao Sun
HIVE-15796.3.patch
14/Feb/17 06:04
11 kB
Chao Sun
HIVE-15796.4.patch
18/Feb/17 01:57
29 kB
Chao Sun
HIVE-15796.5.patch
18/Feb/17 20:18
30 kB
Chao Sun
HIVE-15796.6.patch
18/Feb/17 22:22
30 kB
Chao Sun
HIVE-15796.wip.1.patch
03/Feb/17 04:59
9 kB
Chao Sun
HIVE-15796.wip.2.patch
03/Feb/17 23:33
9 kB
Chao Sun
HIVE-15796.wip.patch
03/Feb/17 01:21
9 kB
Chao Sun

Issue Links

relates to

HIVE-16009 HoS: refactor set reducer parallelism

Open

Activity

People

Assignee:: Chao Sun

Reporter:: Chao Sun

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Feb/17 00:09

Updated:: 21/Jul/17 18:36

Resolved:: 22/Feb/17 17:31