[SPARK-17983] Can't filter over mixed case parquet columns of converted Hive tables - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None

Description

We should probably revive https://github.com/apache/spark/pull/14750 in order to fix this issue and related classes of issues.

The only other alternatives are (1) reconciling on-disk schemas with metastore schema at planning time, which seems pretty messy, and (2) fixing all the datasources to support case-insensitive matching, which also has issues.

Reproduction:

  private def setupPartitionedTable(tableName: String, dir: File): Unit = {
    spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as partCol2").write
      .partitionBy("partCol1", "partCol2")
      .mode("overwrite")
      .parquet(dir.getAbsolutePath)

    spark.sql(s"""
      |create external table $tableName (normalCol long)
      |partitioned by (partCol1 int, partCol2 int)
      |stored as parquet
      |location "${dir.getAbsolutePath}"""".stripMargin)
    spark.sql(s"msck repair table $tableName")
  }

  test("filter by mixed case col") {
    withTable("test") {
      withTempDir { dir =>
        setupPartitionedTable("test", dir)
        val df = spark.sql("select * from test where normalCol = 3")
        assert(df.count() == 1)
      }
    }
  }

cc cloud_fan

Attachments

Issue Links

links to

[Github] Pull Request #14750 (cloud-fan)

Activity

People

Assignee:: Wenchen Fan

Reporter:: Eric Liang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 18/Oct/16 01:53

Updated:: 05/Nov/16 07:58

Resolved:: 05/Nov/16 07:58