Details
Description
when I tried to load partitioned orc files with a slight difference in a nested column. say
column
– request: struct (nullable = true)
– datetime: string (nullable = true) | |
– host: string (nullable = true) | |
– ip: string (nullable = true) | |
– referer: string (nullable = true) | |
– request_uri: string (nullable = true) | |
– uri: string (nullable = true) | |
– useragent: string (nullable = true) And then there's a page_url_lists attributes in the later partitions. |
I tried to use
val s = sqlContext.read.format("orc").option("mergeSchema", "true").load("/data/warehouse/xxxx") to load the data.
But the schema doesn't show request.page_url_lists.
I am wondering if schema merge doesn't work for orc?
Attachments
Issue Links
- blocks
-
SPARK-20901 Feature parity for ORC with Parquet
-
- Open
-
- is duplicated by
-
SPARK-21019 read orc when some of the columns are missing in some files
-
- Resolved
-
- links to