Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.4.7, 3.1.0
Description
Most JDBC drivers use long SQL type for dataset row ID:
(in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils)
private def getCatalystType( sqlType: Int, precision: Int, scale: Int, signed: Boolean): DataType = { val answer = sqlType match { // scalastyle:off ... case java.sql.Types.ROWID => LongType ... case _ => throw new SQLException("Unrecognized SQL type " + sqlType) // scalastyle:on } if (answer == null) { throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName) } answer
Oracle JDBC drivers (of all versions) are rare exception, only String value can be extracted:
(in oracle.jdbc.driver.RowidAccessor, decompiled bytecode)
... String getString(int var1) throws SQLException { return this.isNull(var1) ? null : this.rowData.getString(this.getOffset(var1), this.getLength(var1), this.statement.connection.conversion.getCharacterSet((short)1)); } Object getObject(int var1) throws SQLException { return this.getROWID(var1); } ...
This caused an exception to be thrown when importing datasets from an Oracle DB, as reported in https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:
{{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440) at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228) at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}}
Therefore, the default SQL type => Catalyst type conversion rule should be overriden in OracleDialect. Specifically, the following rule should be added:
case Types.ROWID => Some(StringType)