Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21519

Add an option to the JDBC data source to initialize the environment of the remote database session

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.1.0, 2.1.1, 2.2.0
    • 2.3.0
    • SQL
    • None

    Description

      This proposes an option to the JDBC datasource, tentatively called "sessionInitStatement" to implement the functionality of session initialization present for example in the Sqoop connector for Oracle (see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements) . After each database session is opened to the remote DB, and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block in the case of Oracle).
      Example of usage, relevant to Oracle JDBC:

      val preambleSQL="""
      begin 
        execute immediate 'alter session set tracefile_identifier=sparkora'; 
        execute immediate 'alter session set "_serial_direct_read"=true';
        execute immediate 'alter session set time_zone=''+02:00''';
      end;
      """
      
      bin/spark-shell --jars ojdb6.jar
      
      val df = spark.read
                 .format("jdbc")
                 .option("url", "jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name")
                 .option("driver", "oracle.jdbc.driver.OracleDriver")
                 .option("dbtable", "(select 1, sysdate, systimestamp, current_timestamp, localtimestamp from dual)")
                 .option("user", "MYUSER")
                 .option("password", "MYPASSWORD").option("fetchsize",1000)
                 .option("sessionInitStatement", preambleSQL)
                 .load()
      
      df.show(5,false)
      

      Comments: This proposal has been developed and tested for connecting the Spark JDBC data source to Oracle databases, however I believe it can be useful for other target DBs too, as it is quite generic.
      The code executed by the option "sessionInitStatement" is just the user-provided string fed through the execute method of the JDBC connection, so it can use the features of the target database language/syntax. When using sessionInitStatement for querying Oracle, for example, the user-provided command can be a SQL statement or a PL/SQL block grouping multiple commands and logic.
      Note the proposed code allows to inject SQL into the target database. This is not a security concern as such, as it requires password authentication, however beware of the possibilities of injecting user-provided SQL (and PL/SQL) that this opens.

      Attachments

        Activity

          People

            lucacanali Luca Canali
            lucacanali Luca Canali
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: