[SPARK-17861] Store data source partitions in metastore and push partition pruning into metastore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None

Target Version/s:

2.1.0

Description

Initially, Spark SQL does not store any partition information in the catalog for data source tables, because initially it was designed to work with arbitrary files. This, however, has a few issues for catalog tables:

1. Listing partitions for a large table (with millions of partitions) can be very slow during cold start.
2. Does not support heterogeneous partition naming schemes.
3. Cannot leverage pushing partition pruning into the metastore.

This ticket tracks the work required to push the tracking of partitions into the metastore. This change should be feature flagged.

Attachments

Issue Links

is depended upon by

HADOOP-13525 Optimize uses of FS operations in the ASF analysis frameworks and libraries

Resolved

is related to

SPARK-15691 Refactor and improve Hive support

Resolved

relates to

SPARK-16980 Load only catalog table partition metadata required to answer a query

Resolved

Sub-Tasks

1.	Load only catalog table partition metadata required to answer a query	Resolved	Michael MacFadden
2.	Feature flag SPARK-16980	Resolved	Eric Liang
3.	Refactor FileCatalog classes to simplify the inheritance tree	Resolved	Eric Liang
4.	Fix refreshByPath for converted Hive tables	Resolved	Eric Liang
5.	Enable metastore partition pruning for unconverted hive tables by default	Resolved	Eric Liang
6.	Add back a file status cache for catalog tables	Resolved	Eric Liang
7.	should not always lowercase partition columns of partition spec in parser	Resolved	Wenchen Fan
8.	Use metastore for managing filesource table partitions as well	Resolved	Wenchen Fan
9.	put hive serde table schema to table properties like data source table	Resolved	Wenchen Fan
10.	Can't filter over mixed case parquet columns of converted Hive tables	Resolved	Wenchen Fan
11.	Optimize insert to not require REPAIR TABLE	Resolved	Eric Liang
12.	ExternalCatalogSuite should test with mixed case fields	Resolved	Wenchen Fan
13.	Avoid using Union to chain together create table and repair partition commands	Resolved	Eric Liang
14.	INSERT OVERWRITE TABLE ... PARTITION will overwrite the entire Datasource table instead of just the specified partition	Resolved	Eric Liang
15.	INSERT [INTO\|OVERWRITE] TABLE ... PARTITION for Datasource tables cannot handle partitions with custom locations	Resolved	Eric Liang
16.	data source tables should support truncating partition	Resolved	Wenchen Fan
17.	HiveClient.getPartitionsByFilter throws an exception for some unsupported filters when hive.metastore.try.direct.sql=false	Resolved	Michael MacFadden
18.	Rename partitionProviderIsHive -> tracksPartitionsInCatalog	Resolved	Reynold Xin
19.	ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names	Resolved	Wenchen Fan
20.	correct several partition related behaviours of ExternalCatalog	Resolved	Wenchen Fan
21.	Revert hacks in parquet and orc reader to support case insensitive resolution	Resolved	Eric Liang
22.	Should fix INSERT OVERWRITE TABLE of Datasource tables with dynamic partitions	Resolved	Eric Liang
23.	Update documentation for hive partition management in 2.1	Resolved	Eric Liang
24.	Append with df.saveAsTable writes data to wrong location	Resolved	Eric Liang
25.	Major performance regression in SHOW PARTITIONS on partitioned Hive tables	Resolved	Wenchen Fan
26.	Verify number of hive client RPCs in PartitionedTablePerfStatsSuite	Resolved	Eric Liang
27.	Partition name/values not escaped correctly in some cases	Resolved	Eric Liang
28.	Regression in file listing performance	Resolved	Eric Liang
29.	Incorrect behaviors in overwrite table for datasource tables	Resolved	Eric Liang
30.	Creating a partitioned datasource table should not scan all files for table	Resolved	Eric Liang
31.	Return Nothing when Querying a Partitioned Data Source Table without Repairing it	Closed	Unassigned

Activity

People

Assignee:: Eric Liang

Reporter:: Reynold Xin

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 11/Oct/16 00:57

Updated:: 01/Aug/18 18:11

Resolved:: 30/Nov/16 04:06