Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32618

ORC writer doesn't support colon in column names

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: Input/Output
    • Labels:
      None

      Description

      Hi,

      I'm getting an IllegalArgumentException: Can't parse category at 'struct<a:b^:int>' when exporting to ORC a dataframe whose column names contain colon (:). Reproducible as hereunder. Same problem also occurs if the name with colon appears nested as member of a struct.

      Seems related with SPARK-21791(which was solved in 2.3.0).

      In my real-life case, the column was actually xsi:type, coming from some parsed xml. Thus other users may be affected too.

      Has it been fixed after Spark 2.3.0? (sorry, can't test easily)

      Any workaround? Would be acceptable for me to find and replace all colons with underscore in column names, but not easy to do in a big set of nested struct columns...

      Thanks

       

       

       spark.conf.set("spark.sql.orc.impl", "native")
      
       val dfColon = Seq(1).toDF("a:b")
       dfColon.printSchema()
       dfColon.show()
       dfColon.write.orc("test_colon")
       // Fails with IllegalArgumentException: Can't parse category at 'struct<a:b^:int>'
       
       import org.apache.spark.sql.functions.struct
       val dfColonStruct = dfColon.withColumn("x", struct($"a:b")).drop("a:b")
       dfColonStruct.printSchema()
       dfColonStruct.show()
       dfColon.write.orc("test_colon_struct")
       // Fails with IllegalArgumentException: Can't parse category at 'struct<x:struct<a:b^:int>>'
      

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Pierre Gramme Pierre Gramme
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: