Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Resolved
-
3.0.0
-
None
-
None
Description
This is somewhat related, though not identical to, rdblue's SPIP on datasources and catalogs.
Here are the requirements (IMO) for fully implementing V2 datasources and their relationships to catalogs:
- The global catalog should be configurable (the default can be HMS, but it should be overridable).
- The default catalog (or an explicitly specified catalog in a query, once multiple catalogs are supported) can determine the V2 datasource to use for reading and writing the data.
- Once multiple catalogs are supported, a user should be able to specify a catalog on spark.read and df.write operations. As specified above, The catalog would determine the datasource to use for the read or write operation.
Old #3:
Conversely, a V2 datasource can determine which catalog to use for resolution (e.g., if the user issues spark.read.format("acmex").table("mytable"), the acmex datasource would decide which catalog to use for resolving “mytable”).
Attachments
Issue Links
- is related to
-
SPARK-15691 Refactor and improve Hive support
- Resolved
-
SPARK-15777 Catalog federation
- Resolved
- links to