[SPARK-4413] Parquet support through datasource API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: SQL
Labels:
None

Target Version/s:

1.2.0

Description

Right now there are several issues with out parquet support. Specifically, the only way to access parquet files though pure SQL is by including Hive, which has the following issues

fairly verbose syntax
requires you to explicitly add partitions
does not support decimal types.
querying tables with many partitions results in metadata operations dominating the query time (even worse when reading from S3).

It would be great to have better native support here though the new datasources API. Ideally once that is in place we can deprecate the existing ParquetRelation.

Attachments

Issue Links

links to

[Github] Pull Request #3269 (marmbrus)

Activity

People

Assignee:: Michael Armbrust

Reporter:: Michael Armbrust

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Nov/14 20:44

Updated:: 21/Nov/14 02:31

Resolved:: 21/Nov/14 02:31