[SPARK-9342] Spark SQL views don't work - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.1
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
- sql
- views
Environment:

Ubuntu on AWS

Description

The Spark SQL documentation's section on Hive support claims that views are supported. However, even basic view operations fail with exceptions related to column resolution.

For example,

// The test table has columns category & num
ctx.sql("create view view1 as select * from test")
ctx.table("view1").printSchema

generates

org.apache.spark.sql.AnalysisException: cannot resolve 'test.col' given input columns category, num; line 1 pos 7
	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        ...

You can see a standalone reproducible example with full spark-shell output demonstrating the problem at https://gist.github.com/ssimeonov/57164f9d6b928ba0cfde

The problem is that ctx.sql("create view view1 as select * from test") puts the following in the metastore including cols:[FieldSchema(name:col, type:string, comment:null)] even though the test table has category and num columns:

15/07/26 15:47:28 INFO HiveMetaStore: 0: create_table: Table(tableName:view1, dbName:default, owner:ubuntu, createTime:1437925648, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{}, viewOriginalText:select * from test, viewExpandedText:select `test`.`col` from `default`.`test`, tableType:VIRTUAL_VIEW)
15/07/26 15:47:28 INFO audit: ugi=ubuntu	ip=unknown-ip-addr	cmd=create_table: Table(tableName:view1, dbName:default, owner:ubuntu, createTime:1437925648, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{}, viewOriginalText:select * from test, viewExpandedText:select `test`.`col` from `default`.`test`, tableType:VIRTUAL_VIEW)