Description
We should probably revive https://github.com/apache/spark/pull/14750 in order to fix this issue and related classes of issues.
The only other alternatives are (1) reconciling on-disk schemas with metastore schema at planning time, which seems pretty messy, and (2) fixing all the datasources to support case-insensitive matching, which also has issues.
Reproduction:
private def setupPartitionedTable(tableName: String, dir: File): Unit = { spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as partCol2").write .partitionBy("partCol1", "partCol2") .mode("overwrite") .parquet(dir.getAbsolutePath) spark.sql(s""" |create external table $tableName (normalCol long) |partitioned by (partCol1 int, partCol2 int) |stored as parquet |location "${dir.getAbsolutePath}"""".stripMargin) spark.sql(s"msck repair table $tableName") } test("filter by mixed case col") { withTable("test") { withTempDir { dir => setupPartitionedTable("test", dir) val df = spark.sql("select * from test where normalCol = 3") assert(df.count() == 1) } } }
cc cloud_fan