Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18687

Backward compatibility - creating a Dataframe on a new SQLContext object fails with a Derby error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.0.1, 2.0.2
    • Fix Version/s: 2.0.3, 2.1.1, 2.2.0
    • Component/s: PySpark, SQL
    • Labels:
      None
    • Environment:

      Spark built with hive support

      Description

      With a local spark instance built with hive support, (-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver)

      The following script/sequence works in Pyspark without any error in 1.6.x, but fails in 2.x.

      people = sc.parallelize(["Michael,30", "Andy,12", "Justin,19"])
      peoplePartsRDD = people.map(lambda p: p.split(","))
      peopleRDD = peoplePartsRDD.map(lambda p: pyspark.sql.Row(name=p[0], age=int(p[1])))
      peopleDF= sqlContext.createDataFrame(peopleRDD)
      peopleDF.first()
      
      sqlContext2 = SQLContext(sc)
      people2 = sc.parallelize(["Abcd,40", "Efgh,14", "Ijkl,16"])
      peoplePartsRDD2 = people2.map(lambda l: l.split(","))
      peopleRDD2 = peoplePartsRDD2.map(lambda p: pyspark.sql.Row(fname=p[0], age=int(p[1])))
      peopleDF2 = sqlContext2.createDataFrame(peopleRDD2) # <==== error here
      

      The error produced is:

      16/12/01 22:35:36 ERROR Schema: Failed initialising database.
      Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
      java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@4494053, see the next exception for details.
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
              at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
      .
      .
      ------
      
      org.datanucleus.exceptions.NucleusDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
      java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see the next exception for details.
              at org.apache.derby.impl.jdb
      .
      .
      .
      NestedThrowables:
      java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
      java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see the next exception for details.
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
      .
      .
      .
      Caused by: java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
      java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see the next exception for details.
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
              at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
              at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
              at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
      .
      .
      .
      16/12/01 22:48:09 ERROR Schema: Failed initialising database.
      Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
      java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see the next exception for details.
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
              at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
              at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
              at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
      .
      .
      .
      Caused by: java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see the next exception for details.
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
              at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
              at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
              at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
      .
      .
      .
      
      Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see the next exception for details.
              at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
              at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
              ... 111 more
      Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /Users/vinayak/devel/spark-stc/git_repo/spark-master-x/spark/metastore_db.
              at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
              at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
              at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
              at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
              at java.security.AccessController.doPrivileged(Native Method)
              at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
              at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
              at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
      
      

      The error goes away if sqlContext2 is replaced with sqlContext in the last (error) line. Since the SQLContext class is preserved for backward compatibility, the changes in 2.x break scripts/notebooks that follow the above pattern of calls and used to run fine with 1.6.x.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vijoshi Vinayak Joshi
                Reporter:
                vijoshi Vinayak Joshi
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: