Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9342

Spark SQL views don't work

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.1
    • 2.0.0
    • SQL
    • Ubuntu on AWS

    Description

      The Spark SQL documentation's section on Hive support claims that views are supported. However, even basic view operations fail with exceptions related to column resolution.

      For example,

      // The test table has columns category & num
      ctx.sql("create view view1 as select * from test")
      ctx.table("view1").printSchema
      

      generates

      org.apache.spark.sql.AnalysisException: cannot resolve 'test.col' given input columns category, num; line 1 pos 7
      	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
              ...
      

      You can see a standalone reproducible example with full spark-shell output demonstrating the problem at https://gist.github.com/ssimeonov/57164f9d6b928ba0cfde

      The problem is that ctx.sql("create view view1 as select * from test") puts the following in the metastore including cols:[FieldSchema(name:col, type:string, comment:null)] even though the test table has category and num columns:

      15/07/26 15:47:28 INFO HiveMetaStore: 0: create_table: Table(tableName:view1, dbName:default, owner:ubuntu, createTime:1437925648, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{}, viewOriginalText:select * from test, viewExpandedText:select `test`.`col` from `default`.`test`, tableType:VIRTUAL_VIEW)
      15/07/26 15:47:28 INFO audit: ugi=ubuntu	ip=unknown-ip-addr	cmd=create_table: Table(tableName:view1, dbName:default, owner:ubuntu, createTime:1437925648, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{}, viewOriginalText:select * from test, viewExpandedText:select `test`.`col` from `default`.`test`, tableType:VIRTUAL_VIEW)
      

      Attachments

        Issue Links

          Activity

            People

              smilegator Xiao Li
              simeons Simeon Simeonov
              Votes:
              5 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: