XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      We should probably revive https://github.com/apache/spark/pull/14750 in order to fix this issue and related classes of issues.

      The only other alternatives are (1) reconciling on-disk schemas with metastore schema at planning time, which seems pretty messy, and (2) fixing all the datasources to support case-insensitive matching, which also has issues.

      Reproduction:

        private def setupPartitionedTable(tableName: String, dir: File): Unit = {
          spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as partCol2").write
            .partitionBy("partCol1", "partCol2")
            .mode("overwrite")
            .parquet(dir.getAbsolutePath)
      
          spark.sql(s"""
            |create external table $tableName (normalCol long)
            |partitioned by (partCol1 int, partCol2 int)
            |stored as parquet
            |location "${dir.getAbsolutePath}"""".stripMargin)
          spark.sql(s"msck repair table $tableName")
        }
      
        test("filter by mixed case col") {
          withTable("test") {
            withTempDir { dir =>
              setupPartitionedTable("test", dir)
              val df = spark.sql("select * from test where normalCol = 3")
              assert(df.count() == 1)
            }
          }
        }
      

      cc Wenchen Fan

        Attachments

          Activity

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              ekhliang Eric Liang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: