I noticed some critical changes on my hive tables and realized that they were caused by a simple select on SparkSQL. Looking at the logs, I found out that this select was actually performing an update on the database "Saving case-sensitive schema for table".
I then found out that Spark 2.2.0 introduces a new default value for spark.sql.hive.caseSensitiveInferenceMode (see
The issue is that this update changes critical metadata of the table, in particular:
- changes the owner to the current user
- removes bucketing metadata (BUCKETING_COLS, SDS)
- removes sorting metadata (SORT_COLS)
Switching the property to: NEVER_INFER prevents the issue.
Also, note that the damage can be fix manually in Hive with e.g.:
In Spark 2.1.x (branch-2.1), NEVER_INFER is used. Spark 2.3 (master) branch is good due to
SPARK-17729. This is a regression on Spark 2.2 only. By default, Parquet Hive table is affected and only Hive may suffer from this.