Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20808

External Table unnecessarily not created in Hive-compatible way

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 2.1.0, 2.1.1
    • 2.2.0
    • SQL
    • None

    Description

      In Spark 2.1.0 and 2.1.1 spark.catalog.createExternalTable creates tables unnecessarily in a hive-incompatible way.

      For instance executing in a spark shell

      val database = "default"
      val table = "table_name"
      val path = "/user/daki/"  + database + "/" + table
      
      var data = Array(("Alice", 23), ("Laura", 33), ("Peter", 54))
      val df = sc.parallelize(data).toDF("name","age") 
      
      df.write.mode(org.apache.spark.sql.SaveMode.Overwrite).parquet(path)
      
      spark.sql("DROP TABLE IF EXISTS " + database + "." + table)
      
      spark.catalog.createExternalTable(database + "."+ table, path)
      

      issues the warning

      Search Subject for Kerberos V5 INIT cred (<<DEF>>, sun.security.jgss.krb5.Krb5InitCredential)
      17/05/19 11:01:17 WARN hive.HiveExternalCatalog: Could not persist `default`.`table_name` in a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific format.
      org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:User daki does not have privileges for CREATETABLE)
      	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)
      ...
      

      The Exception (user does not have privileges for CREATETABLE) is misleading (I do have the CREATE TABLE privilege).

      Querying the table with Hive does not return any result. With Spark one can access the data.

      The following code creates the table correctly (workaround):

      def sqlStatement(df : org.apache.spark.sql.DataFrame, database : String, table: String, path: String) : String = {
        val rows = (for(col <- df.schema) 
                          yield "`" + col.name + "` " + col.dataType.simpleString).mkString(",\n")
        val sqlStmnt = ("CREATE EXTERNAL TABLE `%s`.`%s` (%s) " +
          "STORED AS PARQUET " +
          "Location 'hdfs://nameservice1%s'").format(database, table, rows, path)
        return sqlStmnt
      }
      
      spark.sql("DROP TABLE IF EXISTS " + database + "." + table)
      spark.sql(sqlStatement(df, database, table, path))
      

      The code is executed via YARN against a Cloudera CDH 5.7.5 cluster with Sentry enabled (in case this matters regarding the privilege warning). Spark was built against the CDH libraries.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jhereth Joachim Hereth
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: