[HIVE-8219] Multi-Insert optimization, don't sink the source into a file [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Spark
Labels:
- Spark-M1

Description

Current implementation split the operator plan at the lowest common ancester by inserting one FileSinkOperator and a list of TableScanOperators. Writing to a file (by the FS) is expensive. We should be able to insert a ReduceSinkOperator instead. The result RDD from the first job can be cached and refereed in subsequent Spark jobs.

This is a followup for ~~HIVE-7503~~.

Attachments

Issue Links

is part of

HIVE-7292 Hive on Spark

Resolved

is related to

HIVE-7503 Support Hive's multi-table insert query with Spark [Spark Branch]

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Xuefu Zhang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 22/Sep/14 20:19

Updated:: 06/Nov/14 04:27

Resolved:: 06/Nov/14 04:27