Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24814

Relationship between catalog and datasources

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • 3.0.0
    • None
    • SQL
    • None

    Description

      This is somewhat related, though not identical to, rdblue's SPIP on datasources and catalogs.

      Here are the requirements (IMO) for fully implementing V2 datasources and their relationships to catalogs:

      1. The global catalog should be configurable (the default can be HMS, but it should be overridable).
      2. The default catalog (or an explicitly specified catalog in a query, once multiple catalogs are supported) can determine the V2 datasource to use for reading and writing the data.
      3. Once multiple catalogs are supported, a user should be able to specify a catalog on spark.read and df.write operations. As specified above, The catalog would determine the datasource to use for the read or write operation.

      Old #3:
      Conversely, a V2 datasource can determine which catalog to use for resolution (e.g., if the user issues spark.read.format("acmex").table("mytable"), the acmex datasource would decide which catalog to use for resolving “mytable”).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bersprockets Bruce Robbins
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: