[HIVE-8054] Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:

Description

Option hive.optimize.union.remove introduced in ~~HIVE-3276~~ removes union operators from the operator graph in certain cases as an optimization reduce the number of MR jobs. While making sense in MR, this optimization is actually harmful to an execution engine such as Spark, which natives supports union without requiring additional jobs. This is because removing union operator creates disjointed operator graphs, each graph generating a job, and thus this optimization requires more jobs to run the query. Not to mention the additional complexity handling linked FS descriptors.

I propose that we disable such optimization when the execution engine is Spark.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-8054.3-spark.patch
15/Sep/14 23:07
145 kB
Na Yang
HIVE-8054.2-spark.patch
15/Sep/14 18:19
141 kB
Na Yang
HIVE-8054-spark.patch
14/Sep/14 06:34
136 kB
Na Yang

Issue Links

is part of

HIVE-7292 Hive on Spark

Resolved

is related to

HIVE-8073 Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]

Closed

relates to

HIVE-8055 Code cleanup after HIVE-8054 [Spark Branch]

Resolved

HIVE-3276 optimize union sub-queries

Closed

Activity

People

Assignee:: Na Yang

Reporter:: Xuefu Zhang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Sep/14 18:22

Updated:: 29/May/15 07:52

Resolved:: 16/Sep/14 02:03