[SPARK-6906] Improve Hive integration support - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Story
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: SQL
Labels:
None

Target Version/s:

1.5.0
Sprint:
Spark 1.5 release

Description

Right now Spark SQL is very coupled to a specific version of Hive for two primary reasons.

Metadata: we use the Hive Metastore client to retrieve information about tables in a metastore.
Execution: UDFs, UDAFs, SerDes, HiveConf and various helper functions for configuration.

Since Hive is generally not compatible across versions, we are currently maintain fairly expensive shim layers to let us talk to both Hive 12 and Hive 13 metastores. Ideally we would be able to talk to more versions of Hive with less maintenance burden.

This task is proposing that we separate the hive version that is used for communicating with the metastore from the version that is used for execution. In doing so we can significantly reduce the size of the shim by only providing compatibility for metadata operations. All execution will be done with single version of Hive (the newest version that is supported by Spark SQL).

Attachments

Issue Links

links to

[Github] Pull Request #6167 (marmbrus)

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Michael Armbrust

Reporter:: Michael Armbrust

Votes:: 6 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 14/Apr/15 18:09

Updated:: 27/Aug/15 13:02

Resolved:: 05/Aug/15 19:35