Description
Spark SQL now doesn’t support creating data frame from a Postgres table that contains user-defined array column. However, it used to allow such type before the Postgres JDBC commit (https://github.com/pgjdbc/pgjdbc/commit/375cb3795c3330f9434cee9353f0791b86125914). The previous behavior was to handle user-defined array column as String.
Given:
- Postgres table with user-defined array column
- Function: DataFrameReader.jdbc - https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/DataFrameReader.html#jdbc-java.lang.String-java.lang.String-java.util.Properties-
Results:
- Exception “java.sql.SQLException: Unsupported type ARRAY” is thrown
Expectation after the change:
- Function call succeeds
- User-defined array is converted as a string in Spark DataFrame
Suggested fix:
- Update “getCatalystType” function in “PostgresDialect” as
val catalystType = toCatalystType(typeName.drop(1), size, scale).map(ArrayType(_)) if (catalystType.isEmpty) Some(StringType) else catalystType