Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36163

Propagate correct JDBC properties in JDBC connector provider and add "connectionProvider" option

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0, 3.1.1, 3.1.2
    • 3.3.0
    • SQL
    • None

    Description

      There are a couple of issues with JDBC connection providers. The first is a bug caused by https://github.com/apache/spark/commit/c3ce9701b458511255072c72b9b245036fa98653 where we would pass all properties, including JDBC data source keys, to the JDBC driver which results in errors like java.sql.SQLException: Unrecognized connection property 'url'.

      Connection properties are supposed to only include vendor properties, url config is a JDBC option and should be excluded.

      The fix would be replacing jdbcOptions.asProperties.asScala.foreach with jdbcOptions.asConnectionProperties.asScala.foreach which is java.sql.Driver friendly.

       

      I also investigated the problem with multiple providers and I think there are a couple of oversights in ConnectionProvider implementation. I think it is missing two things:

      • Any JdbcConnectionProvider should take precedence over BasicConnectionProvider. BasicConnectionProvider should only be selected if there was no match found when inferring providers that can handle JDBC url.
      • There is currently no way to select a specific provider that you want, similar to how you can select a JDBC driver. The use case is, for example, having connection providers for two databases that handle the same URL but have slightly different semantics and you want to select one in one case and the other one in others.
        • I think the first point could be discarded when the second one is addressed.

      You can technically use spark.sql.sources.disabledJdbcConnProviderList to exclude ones that don’t need to be included, but I am not quite sure why it was done that way - it is much simpler to allow users to enforce the provider they want.

      This ticket fixes it by adding a connectionProvider option to the JDBC data source that allows users to select a particular provider when the ambiguity arises.

      Attachments

        Activity

          People

            4thhorseman Ivan
            ivan.sadikov Ivan Sadikov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: