Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27901 Improve the error messages of SQL parser
  3. SPARK-21529

Improve the error message for unsupported Uniontype

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • Qubole, DataBricks

    Description

      We encounter errors when attempting to read Hive tables whose schema contains the uniontype. It appears perhaps that Catalyst
      does not support the uniontype which renders this table unreadable by Spark (2.1). Although, uniontype is arguably incomplete in the Hive
      query engine, it is fully supported by the storage engine and also the Avro data format, which we use for these tables. Therefore, I believe it is
      a valid, usable type construct that should be supported by Spark.

      We've attempted to read the table as follows:

      spark.sql("select * from etl.tbl where acquisition_instant='20170706T133545Z' limit 5").show
      val tblread = spark.read.table("etl.tbl")
      

      But this always results in the same error message. The pertinent error messages are as follows (full stack trace below):

      org.apache.spark.SparkException: Cannot recognize hive type string: uniontype<struct<a:array<uniontype<struct<b: ...
      
      ...
      
      Caused by: org.apache.spark.sql.catalyst.parser.ParseException: 
      mismatched input '<' expecting
      {<EOF>, '('}
      (line 1, pos 9)
      == SQL ==
      uniontype<struct<a:array<uniontype<struct<b: ...
      ---------^^^
      

      Full stack trace

      org.apache.spark.SparkException: Cannot recognize hive type string: uniontype<struct<a:array<uniontype<struct<b:float,c:float,d:double,e:string,f:string,g:string,h:string,i:string,j:string,k:double,l:double,m:string>>>,n:boolean,o:string,p:bigint,q:string>,struct<r:array<struct<s:string,t:array<uniontype<struct<u:float,v:float,w:double,x:string,y:string,z:string,aa:string,ab:string,ac:string,ad:double,ae:double,af:string>>>>>,ag:boolean,ah:string,ai:bigint,aj:string>>
      at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHiveColumn(HiveClientImpl.scala:800)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:377)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:377)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.Iterator$class.foreach(Iterator.scala:893)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
      at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
      at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:377)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:373)
      at scala.Option.map(Option.scala:146)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:373)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:371)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:290)
      at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:231)
      at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:230)
      at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:371)
      at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:79)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
      at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
      at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:648)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:648)
      at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
      at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:647)
      at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:681)
      at org.apache.spark.sql.SparkSession.table(SparkSession.scala:622)
      at org.apache.spark.sql.SparkSession.table(SparkSession.scala:618)
      at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:627)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:43)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:50)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:52)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:54)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:56)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:58)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw.<init>(<console>:60)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw.<init>(<console>:62)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw.<init>(<console>:64)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw.<init>(<console>:66)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$eval$.$print$lzycompute(<console>:7)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$eval$.$print(<console>:6)
      Caused by: org.apache.spark.sql.catalyst.parser.ParseException: 
      mismatched input '<' expecting
      {<EOF>, '('}
      (line 1, pos 9)
      == SQL ==
      uniontype<struct<a:array<uniontype<struct<b:float,c:float,d:double,e:string,f:string,g:string,h:string,i:string,j:string,k:double,l:double,m:string>>>,n:boolean,o:string,p:bigint,q:string>,struct<r:array<struct<s:string,t:array<uniontype<struct<u:float,v:float,w:double,x:string,y:string,z:string,aa:string,ab:string,ac:string,ad:double,ae:double,af:string>>>>>,ag:boolean,ah:string,ai:bigint,aj:string>>
      ---------^^^
      at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
      at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
      at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseDataType(ParseDriver.scala:38)
      at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHiveColumn(HiveClientImpl.scala:797)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:377)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:377)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.Iterator$class.foreach(Iterator.scala:893)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
      at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
      at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:377)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:373)
      at scala.Option.map(Option.scala:146)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:373)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:371)
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:290)
      at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:231)
      at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:230)
      at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:371)
      at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
      at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:79)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
      at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
      at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:648)
      at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:648)
      at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
      at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:647)
      at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:681)
      at org.apache.spark.sql.SparkSession.table(SparkSession.scala:622)
      at org.apache.spark.sql.SparkSession.table(SparkSession.scala:618)
      at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:627)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:43)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:50)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:52)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:54)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:56)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:58)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw$$iw.<init>(<console>:60)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw$$iw.<init>(<console>:62)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw$$iw.<init>(<console>:64)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$read$$iw.<init>(<console>:66)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$eval$.$print$lzycompute(<console>:7)
      at linef5f6809a5e21434ea50b8ad706eb0e8e27.$eval$.$print(<console>:6)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              teabot Elliot West
              Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: