Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25186 Stabilize Data Source V2 API
  3. SPARK-25329

Support passing Kerberos configuration information

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.3.1
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:

      Description

      The current V2 Datasource API provides support for querying a portion of the SparkConfig namespace (spark.datasource.*) via the SessionConfigSupport API.  This was designed with the assumption that all configuration information for v2 data sources should be separate from each other.

      Unfortunately, there are some cross-cutting concerns such as authentication that touch multiple data sources - this means that common configuration items need to be shared amongst multiple data sources.

      In particular, Kerberos setup can use the following configuration items:

      1. userPrincipal, spark configuration:: spark.yarn.principal
      2. userKeytabPath spark configuration: spark.yarn.keytab
      3. krb5ConfPath:  java.security.krb5.conf
      4. kerberos debugging flag: sun.security.krb5.debug 
      5. spark.security.credentials.${service}.enabled
      6. JAAS config: java.security.auth.login.config ??
      7. ZKServerPrincipal ??

      So potential solutions to pass this information to various data sources are:

      1. Pass the entire SparkContext object to data sources (not likely)
      2. Pass the entire SparkConfig Map object to data sources
      3. Pass all required configuration via environment variables
      4. Extend SessionConfigSupport to support passing specific white-listed configuration values
      5. Add a specific data source v2 API "SupportsKerberos" so that a data source can indicate that it supports Kerberos and also provide the means to pass needed configuration info.
      6. Expand out all Kerberos configuration items to be in each data source config namespace that needs it.

      If the data source requires TLS support then we also need to support passing all the  configuration values under  "spark.ssl.*"

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tigerquoll Dale Richardson
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: