[HIVE-17486] Enable SharedWorkOptimizer in tez on HOS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

in ~~HIVE-16602~~, Implement shared scans with Tez.

Given a query plan, the goal is to identify scans on input tables that can be merged so the data is read only once. Optimization will be carried out at the physical level. In Hive on Spark, it caches the result of spark work if the spark work is used by more than 1 child spark work. After sharedWorkOptimizer is enabled in physical plan in HoS, the identical table scans are merged to 1 table scan. This result of table scan will be used by more 1 child spark work. Thus we need not do the same computation because of cache mechanism.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

explain.28.share.false
04/Dec/17 09:40
17 kB
liyunzhang
explain.28.share.true
04/Dec/17 09:40
44 kB
liyunzhang
HIVE-17486.1.patch
05/Dec/17 06:27
54 kB
liyunzhang
HIVE-17486.2.patch
15/Dec/17 08:40
79 kB
liyunzhang
HIVE-17486.3.patch
15/Dec/17 17:15
77 kB
liyunzhang
HIVE-17486.4.patch
16/Dec/17 09:52
77 kB
liyunzhang
HIVE-17486.5.patch
03/Jan/18 01:46
79 kB
liyunzhang
scanshare.after.svg
01/Nov/17 09:00
98 kB
liyunzhang
scanshare.before.svg
01/Nov/17 09:00
107 kB
liyunzhang

Issue Links

is blocked by

HIVE-18289 Support Parquet/Orc format when enable rdd cache in Hive on Spark

Open

HIVE-18301 Investigate to enable MapInput cache in Hive on Spark

Patch Available

links to

design doc

Activity

People

Assignee:: liyunzhang

Reporter:: liyunzhang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 08/Sep/17 08:28

Updated:: 03/Jan/18 06:22