-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Incomplete
-
Affects Version/s: 2.1.0, 2.2.0
-
Fix Version/s: None
-
Component/s: SQL
-
Labels:
If you create a table in Spark sql but then you modify the table in hive to add a column, spark sql doesn't pick up the new column.
Basic example:
t1 = spark.sql("select ip_address from mydb.test_table limit 1") t1.show() +------------+ | ip_address| +------------+ |1.30.25.5| +------------+ t1.write.saveAsTable('mydb.t1') In Hive: alter table mydb.t1 add columns (bcookie string) t1 = spark.table("mydb.t1") t1.show() +------------+ | ip_address| +------------+ |1.30.25.5| +------------+
It looks like its because spark sql is picking up the schema from spark.sql.sources.schema.part.0 rather then from hive.
Interestingly enough it appears that if you create the table differently like:
spark.sql("create table mydb.t1 select ip_address from mydb.test_table limit 1")
Run your alter table on mydb.t1
val t1 = spark.table("mydb.t1")
Then it works properly.
It looks like the difference is when it doesn't work spark.sql.sources.provider=parquet is set.
Its doing this from createDataSourceTable where provider is parquet.