[SPARK-18243] Converge the insert path of Hive tables with data source tables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Target Version/s:

2.2.0

Description

Inserting data into Hive tables has its own implementation that is distinct from data sources: InsertIntoHiveTable, SparkHiveWriterContainer and SparkHiveDynamicPartitionWriterContainer.

I think it should be possible to unify these with data source implementations InsertIntoHadoopFsRelationCommand. We can start by implementing an OutputWriterFactory/OutputWriter that uses Hive's serdes to write data.

Note that one other major difference is that data source tables write directly to the final destination without using some staging directory, and then Spark itself adds the partitions/tables to the catalog. Hive tables actually write to some staging directory, and then call Hive metastore's loadPartition/loadTable function to load those data in.

Attachments

Issue Links

links to

[Github] Pull Request #16517 (cloud-fan)

Activity

People

Assignee:: Wenchen Fan

Reporter:: Reynold Xin

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 03/Nov/16 08:34

Updated:: 18/Jan/17 07:39

Resolved:: 18/Jan/17 07:38