Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32992

In OracleDialect, "RowID" SQL type should be converted into "String" Catalyst type

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.7, 3.1.0
    • Fix Version/s: 3.1.0
    • Component/s: SQL

      Description

      Most JDBC drivers use long SQL type for dataset row ID:

       

      (in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils)

      private def getCatalystType(
       sqlType: Int,
       precision: Int,
       scale: Int,
       signed: Boolean): DataType = {
       val answer = sqlType match {
       // scalastyle:off
       ...
       case java.sql.Types.ROWID => LongType
      ...
       case _ =>
       throw new SQLException("Unrecognized SQL type " + sqlType)
       // scalastyle:on
       }
      if (answer == null)
      { throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName) }
      answer
      

       

      Oracle JDBC drivers (of all versions) are rare exception, only String value can be extracted:

       

      (in oracle.jdbc.driver.RowidAccessor, decompiled bytecode)

      ...
      String getString(int var1) throws SQLException
      { return this.isNull(var1) ? null : this.rowData.getString(this.getOffset(var1), this.getLength(var1), this.statement.connection.conversion.getCharacterSet((short)1)); }
      Object getObject(int var1) throws SQLException
      { return this.getROWID(var1); }
      ...
      

       

      This caused an exception to be thrown when importing datasets from an Oracle DB, as reported in https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:

       
       {{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440)
       at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228)
       at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620)
       at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
       at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}}
       
      

       

      Therefore, the default SQL type => Catalyst type conversion rule should be overriden in OracleDialect. Specifically, the following rule should be added:

      case Types.ROWID => Some(StringType)
      

       

        Attachments

          Activity

            People

            • Assignee:
              maxgekk Max Gekk
              Reporter:
              peng Peng Cheng
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: