[SPARK-24814] Relationship between catalog and datasources - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Resolved
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

This is somewhat related, though not identical to, rdblue's SPIP on datasources and catalogs.

Here are the requirements (IMO) for fully implementing V2 datasources and their relationships to catalogs:

The global catalog should be configurable (the default can be HMS, but it should be overridable).
The default catalog (or an explicitly specified catalog in a query, once multiple catalogs are supported) can determine the V2 datasource to use for reading and writing the data.
Once multiple catalogs are supported, a user should be able to specify a catalog on spark.read and df.write operations. As specified above, The catalog would determine the datasource to use for the read or write operation.

Old #3:
Conversely, a V2 datasource can determine which catalog to use for resolution (e.g., if the user issues spark.read.format("acmex").table("mytable"), the acmex datasource would decide which catalog to use for resolving “mytable”).

Attachments

Issue Links

is related to

SPARK-15691 Refactor and improve Hive support

Resolved

SPARK-15777 Catalog federation

Resolved

links to

SPIP: Spark Catalog APIs

Activity

People

Assignee:: Unassigned

Reporter:: Bruce Robbins

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 16/Jul/18 01:15

Updated:: 14/Mar/21 02:57

Resolved:: 14/Mar/21 02:57