Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17861 Store data source partitions in metastore and push partition pruning into metastore
  3. SPARK-17990

ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      Linux
      Mac OS with a case-sensitive filesystem

    • Target Version/s:

      Description

      Writing partition data to an external table's file location and then adding those as table partition metadata is a common use case. However, for tables with partition column names with upper case letters, the SQL command ALTER TABLE ... ADD PARTITION does not work, as illustrated in the following example:

      scala> sql("create external table mixed_case_partitioning (a bigint) PARTITIONED BY (partCol bigint) STORED AS parquet LOCATION '/tmp/mixed_case_partitioning'")
      res0: org.apache.spark.sql.DataFrame = []
      
      scala> spark.sqlContext.range(10).selectExpr("id as a", "id as partCol").write.partitionBy("partCol").mode("overwrite").parquet("/tmp/mixed_case_partitioning")
      

      At this point, doing a hadoop fs -ls /tmp/mixed_case_partitioning produces the following:

      [msa@jupyter ~]$ hadoop fs -ls /tmp/mixed_case_partitioning
      Found 11 items
      -rw-r--r--   3 msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/_SUCCESS
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=0
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=1
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=2
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=3
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=4
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=5
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=6
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=7
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=8
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=9
      

      Returning to the Spark shell, we execute the following to add the partition metadata:

      scala> (0 to 9).foreach { p => sql(s"alter table mixed_case_partitioning add partition(partCol=$p)") }
      

      Examining the HDFS file listing again, we see:

      [msa@jupyter ~]$ hadoop fs -ls /tmp/mixed_case_partitioning
      Found 21 items
      -rw-r--r--   3 msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/_SUCCESS
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=0
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=1
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=2
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=3
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=4
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=5
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=6
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=7
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=8
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:52 /tmp/mixed_case_partitioning/partCol=9
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=0
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=1
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=2
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=3
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=4
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=5
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=6
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=7
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=8
      drwxr-xr-x   - msa supergroup          0 2016-10-18 17:53 /tmp/mixed_case_partitioning/partcol=9
      

      Note that msck repair table mixed_case_partitioning does not exhibit this behavior—it handles this use case correctly.

        Attachments

          Activity

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              michael Michael MacFadden
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: