Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21912

ORC/Parquet table should not create invalid column names

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Currently, users meet job abortions while creating ORC data source tables with invalid column names. We had better prevent this by raising AnalysisException like Paquet data source tables.

      scala> sql("CREATE TABLE orc1 USING ORC AS SELECT 1 `a b`")
      17/09/04 13:28:21 ERROR Utils: Aborting task
      java.lang.IllegalArgumentException: Error: : expected at the position 8 of 'struct<a b:int>' but ' ' is found.
      	at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360)
      ...
      17/09/04 13:28:21 WARN FileOutputCommitter: Could not delete file:/Users/dongjoon/spark-release/spark-master/spark-warehouse/orc1/_temporary/0/_temporary/attempt_20170904132821_0001_m_000000_0
      17/09/04 13:28:21 ERROR FileFormatWriter: Job job_20170904132821_0001 aborted.
      17/09/04 13:28:21 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
      org.apache.spark.SparkException: Task failed while writing rows.
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dongjoon Dongjoon Hyun
                Reporter:
                dongjoon Dongjoon Hyun
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: