Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16415

[Spark][SQL] - Failed to create table due to catalog string error

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      When create below table like below schema, Spark SQL error out for struct type

      SQL:

      CREATE EXTERNAL TABLE date_dim_temporary
        ( d_date_sk                 bigint              --not null
        , d_date_id                 string              --not null
        , d_date                    string
        , d_month_seq               int
        , d_week_seq                int
        , d_quarter_seq             int
        , d_year                    int
        , d_dow                     int
        , d_moy                     int
        , d_dom                     int
        , d_qoy                     int
        , d_fy_year                 int
        , d_fy_quarter_seq          int
        , d_fy_week_seq             int
        , d_day_name                string
        , d_quarter_name            string
        , d_holiday                 string
        , d_weekend                 string
        , d_following_holiday       string
        , d_first_dom               int
        , d_last_dom                int
        , d_same_day_ly             int
        , d_same_day_lq             int
        , d_current_day             string
        , d_current_week            string
        , d_current_month           string
        , d_current_quarter         string
        , d_current_year            string
        )
        ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
        STORED AS TEXTFILE LOCATION '/user/root/benchmarks/test/data/date_dim'
      
      CREATE TABLE date_dim
      STORED AS ORC
      AS
      SELECT * FROM date_dim_temporary
      

      Error Message:

      16/07/05 23:38:43 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 198.0 (TID 677, hw-node5): java.lang.IllegalArgumentException: Error: : expected at the position 400 of 'struct<d_date_sk:bigint,d_date_id:string,d_date:string,d_month_seq:int,d_week_seq:int,d_quarter_seq:int,d_year:int,d_dow:int,d_moy:int,d_dom:int,d_qoy:int,d_fy_year:int,d_fy_quarter_seq:int,d_fy_week_seq:int,d_day_name:string,d_quarter_name:string,d_holiday:string,d_weekend:string,d_following_holiday:string,d_first_dom:int,d_last_dom:int,d_same_day_ly:int,d_same_day_lq:int,d_current_day:string,... 4 more fields>' but ' ' is found.
      	at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360)
      	at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
      	at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:483)
      	at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
      	at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfoFromTypeString(TypeInfoUtils.java:770)
      	at org.apache.spark.sql.hive.orc.OrcSerializer.<init>(OrcFileFormat.scala:184)
      	at org.apache.spark.sql.hive.orc.OrcOutputWriter.<init>(OrcFileFormat.scala:220)
      	at org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:93)
      	at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:130)
      	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:246)
      	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
      	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      	at org.apache.spark.scheduler.Task.run(Task.scala:85)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

        Attachments

          Activity

            People

            • Assignee:
              adrian-wang Adrian Wang
              Reporter:
              jameszhouyi Yi Zhou
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: