[SPARK-11412] Support merge schema for ORC - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.3, 2.0.0, 2.1.1, 2.2.0, 2.3.4, 2.4.5
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

when I tried to load partitioned orc files with a slight difference in a nested column. say
column
– request: struct (nullable = true)

	– datetime: string (nullable = true)
	– host: string (nullable = true)
	– ip: string (nullable = true)
	– referer: string (nullable = true)
	– request_uri: string (nullable = true)
	– uri: string (nullable = true)
	– useragent: string (nullable = true) And then there's a page_url_lists attributes in the later partitions.

I tried to use
val s = sqlContext.read.format("orc").option("mergeSchema", "true").load("/data/warehouse/xxxx") to load the data.
But the schema doesn't show request.page_url_lists.
I am wondering if schema merge doesn't work for orc?

Attachments

Issue Links

blocks

SPARK-20901 Feature parity for ORC with Parquet

Open

is duplicated by

SPARK-21019 read orc when some of the columns are missing in some files

Resolved

links to

GitHub Pull Request #24043

Activity

People

Assignee:: EdisonWang

Reporter:: Dave

Votes:: 7 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 29/Oct/15 23:10

Updated:: 10/Mar/20 08:04

Resolved:: 30/Jun/19 00:12