Details
Description
Spark does not seem to read the schema from the Hive metastore for partitioned tables stored as ORC files. It appears to read the schema from the files themselves, which, if they were created with Hive, does not match the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To reproduce:
In Hive:
hive> create table default.test (id BIGINT, name STRING) partitioned by (state STRING) stored as orc; hive> insert into table default.test partition (state="CA") values (1, "mike"), (2, "steve"), (3, "bill");
In Spark
scala> spark.table("default.test").printSchema
Expected result: Spark should preserve the column names that were defined in Hive.
Actual Result:
root |-- _col0: long (nullable = true) |-- _col1: string (nullable = true) |-- state: string (nullable = true)
Possibly related to SPARK-14959?
Attachments
Issue Links
- blocks
-
SPARK-20901 Feature parity for ORC with Parquet
- Open
- is broken by
-
SPARK-14070 Use ORC data source for SQL queries on ORC tables
- Resolved
- relates to
-
SPARK-16628 OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files
- Resolved
- links to