[SPARK-16628] OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.1, 2.3.0
Component/s: SQL
Labels:
None

Description

When spark.sql.hive.convertMetastoreOrc is enabled, we will convert a ORC table represented by a MetastoreRelation to HadoopFsRelation that uses Spark's OrcFileFormat internally. This conversion aims to make table scanning have a better performance since at runtime, the code path to scan HadoopFsRelation's performance is better. However, OrcFileFormat's implementation is based on the assumption that ORC files store their schema with correct column names. However, before Hive 2.0, an ORC table created by Hive does not store column name correctly in the ORC files (~~HIVE-4243~~). So, for this kind of ORC datasets, we cannot really convert the code path.

Right now, if ORC tables are created by Hive 1.x or 0.x, enabling spark.sql.hive.convertMetastoreOrc will introduce a runtime exception for non-partitioned ORC tables and drop the metastore schema for partitioned ORC tables.

Attachments

Issue Links

blocks

SPARK-20901 Feature parity for ORC with Parquet

Open

is duplicated by

SPARK-15757 Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

Resolved

is related to

SPARK-15705 Spark won't read ORC schema from metastore for partitioned tables

Resolved

SPARK-14387 Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc

Resolved

SPARK-18355 Spark SQL fails to read data from a ORC hive table that has a new column added to it

Resolved

SPARK-21686 spark.sql.hive.convertMetastoreOrc is causing NullPointerException while reading ORC tables

Resolved

links to

[Github] Pull Request #14282 (viirya)

[Github] Pull Request #14365 (viirya)

[Github] Pull Request #19470 (dongjoon-hyun)

(1 is related to, 3 links to)

Activity

People

Assignee:: Dongjoon Hyun

Reporter:: Yin Huai

Votes:: 2 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 19/Jul/16 18:28

Updated:: 13/Oct/17 15:25

Resolved:: 13/Oct/17 15:25