Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-635

ClassCastException in Spark 2.1 Cluster mode in insert query when name of column is changed and When the orders of columns are changed in the tables

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.0-incubating
    • Fix Version/s: 1.0.0-incubating
    • Component/s: data-load
    • Labels:
      None
    • Environment:
      Spark 2.1 Cluster mode

      Description

      ::::::::: SCENARIO 1 :::::::

      CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB");

      CREATE TABLE student (CUST_ID2 int,CUST_ADDR String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB");

      LOAD DATA inpath 'hdfs://hadoop-master:54311/data/2000_UniqData.csv' INTO table uniqdata options('DELIMITER'=',', 'FILEHEADER'='CUST_ID, CUST_NAME, ACTIVE_EMUI_VERSION, DOB, DOJ, BIGINT_COLUMN1, BIGINT_COLUMN2, DECIMAL_COLUMN1, DECIMAL_COLUMN2, Double_COLUMN1, Double_COLUMN2, INTEGER_COLUMN1');

      insert into student select * from uniqdata;

      ::::::::: SCENARIO 2 :::::::

      CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB");

      CREATE TABLE student (ACTIVE_EMUI_VERSION string, DOB timestamp, CUST_ID int,CUST_NAME String, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB");

      LOAD DATA inpath 'hdfs://hadoop-master:54311/data/2000_UniqData.csv' INTO table uniqdata options('DELIMITER'=',', 'FILEHEADER'='CUST_ID, CUST_NAME, ACTIVE_EMUI_VERSION, DOB, DOJ, BIGINT_COLUMN1, BIGINT_COLUMN2, DECIMAL_COLUMN1, DECIMAL_COLUMN2, Double_COLUMN1, Double_COLUMN2, INTEGER_COLUMN1');

      Above two scenarios have the same result and exception as below,

      0: jdbc:hive2://hadoop-master:10000> insert into student select * from uniqdata;
      Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 26.0 failed 4 times, most recent failure: Lost task 0.3 in stage 26.0 (TID 38, 192.168.2.176, executor 0): java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer
      at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
      at org.apache.spark.sql.CarbonDictionaryDecoder$$anonfun$doExecute$1$$anonfun$7$$anon$1$$anonfun$next$1.apply$mcVI$sp(CarbonDictionaryDecoder.scala:186)
      at org.apache.spark.sql.CarbonDictionaryDecoder$$anonfun$doExecute$1$$anonfun$7$$anon$1$$anonfun$next$1.apply(CarbonDictionaryDecoder.scala:183)
      at org.apache.spark.sql.CarbonDictionaryDecoder$$anonfun$doExecute$1$$anonfun$7$$anon$1$$anonfun$next$1.apply(CarbonDictionaryDecoder.scala:183)
      at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
      at org.apache.spark.sql.CarbonDictionaryDecoder$$anonfun$doExecute$1$$anonfun$7$$anon$1.next(CarbonDictionaryDecoder.scala:183)
      at org.apache.spark.sql.CarbonDictionaryDecoder$$anonfun$doExecute$1$$anonfun$7$$anon$1.next(CarbonDictionaryDecoder.scala:174)
      at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      at org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD.compute(CarbonGlobalDictionaryRDD.scala:293)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
      at org.apache.spark.scheduler.Task.run(Task.scala:99)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Driver stacktrace: (state=,code=0)

        Attachments

        1. driverlog
          13 kB
          Harsh Sharma
        2. 2000_UniqData.csv
          416 kB
          Harsh Sharma

          Activity

            People

            • Assignee:
              ravi.pesala Ravindra Pesala
              Reporter:
              harshsharma8 Harsh Sharma
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 50m
                1h 50m