Description
There are a couple of issues with JDBC connection providers. The first is a bug caused by https://github.com/apache/spark/commit/c3ce9701b458511255072c72b9b245036fa98653 where we would pass all properties, including JDBC data source keys, to the JDBC driver which results in errors like java.sql.SQLException: Unrecognized connection property 'url'.
Connection properties are supposed to only include vendor properties, url config is a JDBC option and should be excluded.
The fix would be replacing jdbcOptions.asProperties.asScala.foreach with jdbcOptions.asConnectionProperties.asScala.foreach which is java.sql.Driver friendly.
I also investigated the problem with multiple providers and I think there are a couple of oversights in ConnectionProvider implementation. I think it is missing two things:
- Any JdbcConnectionProvider should take precedence over BasicConnectionProvider. BasicConnectionProvider should only be selected if there was no match found when inferring providers that can handle JDBC url.
- There is currently no way to select a specific provider that you want, similar to how you can select a JDBC driver. The use case is, for example, having connection providers for two databases that handle the same URL but have slightly different semantics and you want to select one in one case and the other one in others.
-
- I think the first point could be discarded when the second one is addressed.
You can technically use spark.sql.sources.disabledJdbcConnProviderList to exclude ones that don’t need to be included, but I am not quite sure why it was done that way - it is much simpler to allow users to enforce the provider they want.
This ticket fixes it by adding a connectionProvider option to the JDBC data source that allows users to select a particular provider when the ambiguity arises.