[SPARK-17179] Consider improving partition pruning in HiveMetastoreCatalog - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.0.0
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None

Target Version/s:

2.1.0

Description

Issue:

Create an external table with 1000s of partition
Running simple query with partition details ends up listing all files for caching in ListingFileCatalog. This would turn out to be very slow in cloud based FS access (e.g S3). Even though, ListingFileCatalog supports
multi-threading, it would end up unncessarily listing 1000+ files when user is just interested in 1 partition.
This adds up additional overhead in HiveMetastoreCatalog as it queries all partitions in convertToLogicalRelation (metastoreRelation.getHiveQlPartitions()). Partition related details
are not passed in here, so ends up overloading hive metastore.
Also even if any partition changes, cache would be dirtied and have to be re-populated. It would be nice to prune the partitions in metastore layer itself, so that few partitions are looked up via FileSystem and only few items are cached.

"CREATE EXTERNAL TABLE `ca_par_ext`(
  `customer_id` bigint,
  `account_id` bigint)
PARTITIONED BY (
  `effective_date` date)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3a://bucket_details/ca_par'"

explain select count(*) from ca_par_ext where effective_date between '2015-12-17' and '2015-12-18';

Attachments

Issue Links

duplicates

SPARK-16980 Load only catalog table partition metadata required to answer a query

Resolved

is depended upon by

HADOOP-13525 Optimize uses of FS operations in the ASF analysis frameworks and libraries

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Rajesh Balamohan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Aug/16 05:06

Updated:: 01/Nov/16 21:12

Resolved:: 01/Nov/16 21:12