[HIVE-7503] Support Hive's multi-table insert query with Spark [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:
- spark-m1

Description

For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently.

It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer.

This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-7503.9-spark.patch
20/Sep/14 01:35
61 kB
Chao Sun
HIVE-7503.8-spark.patch
20/Sep/14 01:27
61 kB
Chao Sun
HIVE-7503.7-spark.patch
19/Sep/14 23:54
33 kB
Chao Sun
HIVE-7503.6-spark.patch
18/Sep/14 17:16
34 kB
Chao Sun
HIVE-7503.5-spark.patch
08/Sep/14 17:03
27 kB
Chao Sun
HIVE-7503.4-spark.patch
06/Sep/14 00:35
27 kB
Chao Sun
HIVE-7503.3-spark.patch
05/Sep/14 20:34
25 kB
Chao Sun
HIVE-7503.2-spark.patch
05/Sep/14 17:24
26 kB
Chao Sun
HIVE-7503.1-spark.patch
23/Aug/14 01:18
24 kB
Chao Sun

Issue Links

is blocked by

SPARK-2688 Need a way to run multiple data pipeline concurrently

Resolved

is depended upon by

HIVE-7842 Enable qtest load_dyn_part1.q [Spark Branch]

Resolved

HIVE-8233 multi-table insertion doesn't work with ForwardOperator [Spark Branch]

Resolved

HIVE-8208 Multi-table insertion optimization #1: don't always break operator tree. [Spark Branch]

Resolved

HIVE-8215 Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch]

Resolved

HIVE-8209 Multi-table insertion optimization #2: use separate context [Spark Branch]

Resolved

HIVE-8207 Add .q tests for multi-table insertion [Spark Branch]

Resolved

relates to

HIVE-7731 Incorrect result returned when a map work has multiple downstream reduce works [Spark Branch]

Resolved

HIVE-8438 Clean up code introduced by HIVE-7503 and such [Spark Plan]

Resolved

HIVE-8219 Multi-Insert optimization, don't sink the source into a file [Spark Branch]

Resolved

HIVE-8220 Refactor multi-insert code such that plan splitting and task generation are modular and reusable [Spark Branch]

Resolved

requires

HIVE-7525 Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch]

Resolved

links to

RB Link

(2 is depended upon by, 4 relates to, 1 requires, 1 links to)

Activity

People

Assignee:: Chao Sun

Reporter:: Xuefu Zhang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 24/Jul/14 18:30

Updated:: 29/May/15 02:29

Resolved:: 20/Sep/14 05:30