Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9342

Spark SQL views don't work

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.1
    • Fix Version/s: 2.0.0
    • Component/s: SQL
    • Labels:
    • Environment:

      Ubuntu on AWS

      Description

      The Spark SQL documentation's section on Hive support claims that views are supported. However, even basic view operations fail with exceptions related to column resolution.

      For example,

      // The test table has columns category & num
      ctx.sql("create view view1 as select * from test")
      ctx.table("view1").printSchema
      

      generates

      org.apache.spark.sql.AnalysisException: cannot resolve 'test.col' given input columns category, num; line 1 pos 7
      	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
              ...
      

      You can see a standalone reproducible example with full spark-shell output demonstrating the problem at https://gist.github.com/ssimeonov/57164f9d6b928ba0cfde

      The problem is that ctx.sql("create view view1 as select * from test") puts the following in the metastore including cols:[FieldSchema(name:col, type:string, comment:null)] even though the test table has category and num columns:

      15/07/26 15:47:28 INFO HiveMetaStore: 0: create_table: Table(tableName:view1, dbName:default, owner:ubuntu, createTime:1437925648, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{}, viewOriginalText:select * from test, viewExpandedText:select `test`.`col` from `default`.`test`, tableType:VIRTUAL_VIEW)
      15/07/26 15:47:28 INFO audit: ugi=ubuntu	ip=unknown-ip-addr	cmd=create_table: Table(tableName:view1, dbName:default, owner:ubuntu, createTime:1437925648, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{}, viewOriginalText:select * from test, viewExpandedText:select `test`.`col` from `default`.`test`, tableType:VIRTUAL_VIEW)
      

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              smilegator Xiao Li
              Reporter:
              simeons Simeon Simeonov

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment