[HIVE-7826] Dynamic partition pruning on Tez - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.14.0
Component/s: Tez
Labels:
- TODOC14
- tez

Description

It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates.

It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...).

The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known.

On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running.

The approach is straight forward:

Insert synthetic conditions for each join representing "x in (keys of other side in join)"
This conditions will be pushed as far down as possible
If the condition hits a table scan and the column involved is a partition column:
Setup Operator to send key events to AM
else:
Remove synthetic predicate

Add these properties :

Property	Default Value
`hive.tez.dynamic.partition.pruning`	true
`hive.tez.dynamic.partition.pruning.max.event.size`	110241024L
`hive.tez.dynamic.parition.pruning.max.data.size`	10010241024L

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-7826.1.patch
21/Aug/14 05:51
447 kB
Gunther Hagleitner
HIVE-7826.2.patch
21/Aug/14 06:03
322 kB
Gunther Hagleitner
HIVE-7826.3.patch
25/Aug/14 08:00
328 kB
Gunther Hagleitner
HIVE-7826.4.patch
26/Aug/14 23:24
330 kB
Gunther Hagleitner
HIVE-7826.5.patch
01/Sep/14 06:29
407 kB
Gunther Hagleitner
HIVE-7826.6.patch
02/Sep/14 03:06
421 kB
Gunther Hagleitner
HIVE-7826.7.patch
03/Sep/14 10:32
440 kB
Gunther Hagleitner

Issue Links

is blocked by

HIVE-6988 Hive changes for tez-0.5.x compatibility

Closed

is duplicated by

HIVE-5119 MapJoin & Partition Pruning (MapJoin can take advantage of materialized data to prune partitions of big table)

Resolved

is related to

TEZ-1447 Provide a mechanism for InputInitializers to know about interesting Vertex state changes

Closed

relates to

HIVE-12228 Hive 0.13.1 Error for nested query with UDF returns Struct type

Resolved

HIVE-7976 Merge tez branch into trunk (tez 0.5.0)

Closed

HIVE-8018 Fix typo in config var name for dynamic partition pruning

Closed

links to

Review Board #25019

(1 relates to, 1 links to)

Activity

People

Assignee:: Gunther Hagleitner

Reporter:: Gunther Hagleitner

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 21/Aug/14 05:44

Updated:: 13/Sep/16 05:19

Resolved:: 03/Sep/14 10:48