Description
Initially, Spark SQL does not store any partition information in the catalog for data source tables, because initially it was designed to work with arbitrary files. This, however, has a few issues for catalog tables:
1. Listing partitions for a large table (with millions of partitions) can be very slow during cold start.
2. Does not support heterogeneous partition naming schemes.
3. Cannot leverage pushing partition pruning into the metastore.
This ticket tracks the work required to push the tracking of partitions into the metastore. This change should be feature flagged.
Attachments
Issue Links
- is depended upon by
-
HADOOP-13525 Optimize uses of FS operations in the ASF analysis frameworks and libraries
- Resolved
- is related to
-
SPARK-15691 Refactor and improve Hive support
- Resolved
- relates to
-
SPARK-16980 Load only catalog table partition metadata required to answer a query
- Resolved