[SPARK-6904] SparkSql - HiveContext - optimize reading partition data from metastore - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

I was trying out spark sql using the HiveContext and doing a select on a partitioned table with lots of partitions (16,000+). It took over 6 minutes before it even started the job. It looks like it was querying the Hive metastore and got a good chunk of data back. Which I'm guessing is info on the partitions. Running the same query using hive takes 45 seconds for the entire job.

It would be nice if we could optimize on the partitions when reading from the metastore.

Attachments

Issue Links

Add Link

duplicates

SPARK-6910 Support for pushing predicates down to metastore for partition pruning

Resolved

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Thomas Graves

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/Apr/15 16:01

Updated:: 14/Apr/15 18:14

Resolved:: 14/Apr/15 18:14

Agile

View on Board

SparkSql - HiveContext - optimize reading partition data from metastore

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment