[HIVE-8215] Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Spark
Labels:
None

Description

Currently, for multi-table insertion it generates 1+N tasks - "1" is the task that generates input, and "N" are the insert queries that read from the input and write to separate output tables.

In order to make these N tasks run in parallel, we rely on hive.exec.parallel to be set to true. In this patch, we propose an alternative approach, which is to combine these N tasks into one single task, which contains N separate operator trees, which in execution leads to N result RDDs. We then may be able to execute these N RDDs in parallel inside Spark, without needing hive.exec.parallel.

Attachments

Issue Links

depends upon

HIVE-7503 Support Hive's multi-table insert query with Spark [Spark Branch]

Resolved

Is contained by

HIVE-7292 Hive on Spark

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Chao Sun

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Sep/14 16:23

Updated:: 24/Oct/14 17:54

Resolved:: 24/Oct/14 17:54