Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20367

Spark silently escapes partition column names

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.1.0, 2.2.0
    • 2.2.0, 2.3.0
    • SQL
    • None

    Description

      CSV files can have arbitrary column names:

      scala> spark.range(1).select(col("id").as("Column?"), col("id")).write.option("header", true).csv("/tmp/foo")
      scala> spark.read.option("header", true).csv("/tmp/foo").schema
      res1: org.apache.spark.sql.types.StructType = StructType(StructField(Column?,StringType,true), StructField(id,StringType,true))
      

      However, once a column with characters like "?" in the name gets used in a partitioning column, the column name gets silently escaped, and reading the schema information back renders the column name with "?" turned into "%3F":

      scala> spark.range(1).select(col("id").as("Column?"), col("id")).write.partitionBy("Column?").option("header", true).csv("/tmp/bar")
      scala> spark.read.option("header", true).csv("/tmp/bar").schema
      res3: org.apache.spark.sql.types.StructType = StructType(StructField(id,StringType,true), StructField(Column%3F,IntegerType,true))
      

      The same happens for other formats, but I encountered it working with CSV, since these more often contain ugly schemas...

      Not sure if it's a bug or a feature, but it might be more intuitive to fail queries with invalid characters in the partitioning column name, rather than silently escaping the name?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            juliuszsompolski Juliusz Sompolski
            juliuszsompolski Juliusz Sompolski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment