Description
The Spark SQL documentation's section on Hive support claims that views are supported. However, even basic view operations fail with exceptions related to column resolution.
For example,
// The test table has columns category & num ctx.sql("create view view1 as select * from test") ctx.table("view1").printSchema
generates
org.apache.spark.sql.AnalysisException: cannot resolve 'test.col' given input columns category, num; line 1 pos 7 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ...
You can see a standalone reproducible example with full spark-shell output demonstrating the problem at https://gist.github.com/ssimeonov/57164f9d6b928ba0cfde
The problem is that ctx.sql("create view view1 as select * from test") puts the following in the metastore including cols:[FieldSchema(name:col, type:string, comment:null)] even though the test table has category and num columns:
15/07/26 15:47:28 INFO HiveMetaStore: 0: create_table: Table(tableName:view1, dbName:default, owner:ubuntu, createTime:1437925648, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{}, viewOriginalText:select * from test, viewExpandedText:select `test`.`col` from `default`.`test`, tableType:VIRTUAL_VIEW) 15/07/26 15:47:28 INFO audit: ugi=ubuntu ip=unknown-ip-addr cmd=create_table: Table(tableName:view1, dbName:default, owner:ubuntu, createTime:1437925648, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{}, viewOriginalText:select * from test, viewExpandedText:select `test`.`col` from `default`.`test`, tableType:VIRTUAL_VIEW)
Attachments
Issue Links
- is related to
-
SPARK-9764 Spark SQL uses table metadata inconsistently
- Closed