Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18355

Spark SQL fails to read data from a ORC hive table that has a new column added to it

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.2, 2.1.0, 2.2.0
    • 2.2.1, 2.3.0
    • SQL
    • None
    • Centos6

    Description

      PROBLEM:

      Spark SQL fails to read data from a ORC hive table that has a new column added to it.

      Below is the exception:

      scala> sqlContext.sql("select click_id,search_id from testorc").show
      16/11/03 22:17:53 INFO ParseDriver: Parsing command: select click_id,search_id from testorc
      16/11/03 22:17:54 INFO ParseDriver: Parse Completed
      java.lang.AssertionError: assertion failed
      	at scala.Predef$.assert(Predef.scala:165)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:39)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:38)
      	at scala.Option.map(Option.scala:145)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:38)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation.copy(LogicalRelation.scala:31)
      	at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$convertToOrcRelation(HiveMetastoreCatalog.scala:588)
      	at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:647)
      	at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:643)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
      	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      
      

      STEPS TO SIMULATE THIS ISSUE:

      1) Create table in hive.

      CREATE TABLE `testorc`( 
      `click_id` string, 
      `search_id` string, 
      `uid` bigint)
      PARTITIONED BY ( 
      `ts` string, 
      `hour` string) 
      STORED AS ORC; 
      

      2) Load data into table :

      INSERT INTO TABLE testorc PARTITION (ts = '98765',hour = '01' ) VALUES (12,2,12345);
      

      3) Select through spark shell (This works)

      sqlContext.sql("select click_id,search_id from testorc").show
      

      4) Now add column to hive table

      ALTER TABLE testorc ADD COLUMNS (dummy string);
      

      5) Now again select from spark shell

      scala> sqlContext.sql("select click_id,search_id from testorc").show
      16/11/03 22:17:53 INFO ParseDriver: Parsing command: select click_id,search_id from testorc
      16/11/03 22:17:54 INFO ParseDriver: Parse Completed
      java.lang.AssertionError: assertion failed
      	at scala.Predef$.assert(Predef.scala:165)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:39)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation$$anonfun$1.apply(LogicalRelation.scala:38)
      	at scala.Option.map(Option.scala:145)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:38)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation.copy(LogicalRelation.scala:31)
      	at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$convertToOrcRelation(HiveMetastoreCatalog.scala:588)
      	at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:647)
      	at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:643)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
      	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
      	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
      	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
      	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
      

      Attachments

        Issue Links

          Activity

            People

              dongjoon Dongjoon Hyun
              Sandeep Nemuri Sandeep Nemuri
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: