[SPARK-21769] Add a table option for Hive-serde tables to make Spark always respect schemas inferred by Spark SQL - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: SQL
Labels:
None

Description

For Hive-serde tables, we always respect the schema stored in Hive metastore, because the schema could be altered by the other engines that share the same metastore. Thus, we always trust the metastore-controlled schema for Hive-serde tables when the schemas are different (without considering the nullability and cases). However, in some scenarios, Hive metastore also could INCORRECTLY overwrite the schemas when the serde and Hive metastore built-in serde are different.

The proposed solution is to introduce a table property for such scenarios. For a specific Hive-serde table, users can manually setting such table property for asking Spark for always respect Spark-inferred schema instead of trusting metastore-controlled schema. By default, it is off.

Attachments

Activity

People

Assignee:: Xiao Li

Reporter:: Xiao Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Aug/17 22:56

Updated:: 22/Aug/17 20:18

Resolved:: 22/Aug/17 20:18