[SPARK-15705] Spark won't read ORC schema from metastore for partitioned tables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None
Environment:

HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)

Description

Spark does not seem to read the schema from the Hive metastore for partitioned tables stored as ORC files. It appears to read the schema from the files themselves, which, if they were created with Hive, does not match the metastore schema (at least not before before Hive 2.0, see ~~HIVE-4243~~). To reproduce:

In Hive:

hive> create table default.test (id BIGINT, name STRING) partitioned by (state STRING) stored as orc;
hive> insert into table default.test partition (state="CA") values (1, "mike"), (2, "steve"), (3, "bill");

In Spark

scala> spark.table("default.test").printSchema

Expected result: Spark should preserve the column names that were defined in Hive.

Actual Result:

root
 |-- _col0: long (nullable = true)
 |-- _col1: string (nullable = true)
 |-- state: string (nullable = true)

Possibly related to ~~SPARK-14959~~?

Attachments

Issue Links

blocks

SPARK-20901 Feature parity for ORC with Parquet

Open

is broken by

SPARK-14070 Use ORC data source for SQL queries on ORC tables

Resolved

relates to

SPARK-16628 OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files

Resolved

links to

[Github] Pull Request #14267 (yhuai)

Activity

People

Assignee:: Yin Huai

Reporter:: Nic Eggert

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 01/Jun/16 17:29

Updated:: 07/Dec/17 00:17

Resolved:: 19/Jul/16 19:58