Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
AWS Glue Catalog is an external Hive metastore backed by a web service. It allows permanent storage of catalog data for BigData use cases.
To find out more information about AWS Glue, please consult:
- AWS Glue - https://aws.amazon.com/glue/
- Using Glue as a Metastore catalog for Spark - https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html
Today, the integration of Glue and Spark is through the Hive layer. Glue implements the IMetaStore interface of Hive and for installations of Spark that contain Hive, Glue can be used as the metastore.
The feature set that Glue supports does not align 1-1 with the set of features that the latest version of Spark supports. For example, Glue interface supports more advanced partition pruning that the latest version of Hive embedded in Spark.
To enable a more natural integration with Spark and to allow leveraging latest features of Glue, without being coupled to Hive, a direct integration through Spark's own Catalog API is proposed. This Jira tracks this work.
Attachments
Issue Links
- contains
-
SPARK-22913 Hive Partition Pruning, Fractional and Timestamp types
- Resolved
- is related to
-
SPARK-30617 Is there any possible that spark no longer restrict enumerate types of spark.sql.catalogImplementation
- Closed
- relates to
-
SPARK-17767 Spark SQL ExternalCatalog API custom implementation support
- Closed
-
SPARK-15777 Catalog federation
- Resolved
-
SPARK-24017 Refactor ExternalCatalog to be an interface
- Resolved