[SPARK-14070] Use ORC data source for SQL queries on ORC tables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.6.1
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None

Target Version/s:

2.0.0

Description

Currently if one is trying to query ORC tables in Hive, the plan generated by Spark hows that its using the `HiveTableScan` operator which is generic to all file formats. We could instead use the ORC data source for this so that we can get ORC specific optimizations like predicate pushdown.

Current behaviour:

```
scala> hqlContext.sql("SELECT * FROM orc_table").explain(true)
== Parsed Logical Plan ==
'Project [unresolvedalias(*, None)]
+- 'UnresolvedRelation `orc_table`, None

== Analyzed Logical Plan ==
key: string, value: string
Project key#171,value#172
+- MetastoreRelation default, orc_table, None

== Optimized Logical Plan ==
MetastoreRelation default, orc_table, None

== Physical Plan ==
HiveTableScan key#171,value#172, MetastoreRelation default, orc_table, None
```

Attachments

Issue Links

breaks

SPARK-15705 Spark won't read ORC schema from metastore for partitioned tables

Resolved

is duplicated by

SPARK-12998 Enable OrcRelation when connecting via spark thrift server

Closed

links to

[Github] Pull Request #11891 (tejasapatil)

Activity

People

Assignee:: Tejas Patil

Reporter:: Tejas Patil

Shepherd:: Michael Armbrust

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Mar/16 15:28

Updated:: 10/Nov/16 06:43

Resolved:: 01/Apr/16 20:13