Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-2395

Not able to fetch all data from a complex data type table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.0
    • None
    • data-query
    • None
    • spark 2.1 and spark 2.2

    Description

      Not able to fetch all data from a complex data type table

      Steps to reproduce:

      1) Create a table having complex data type:

       create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, STRUCT_OF_ARRAY struct<ID: int,CHECK_DATE: timestamp,SNo: array<int>,sal1: array<double>,state: array<string>,date1: array<timestamp>>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) STORED BY 'org.apache.carbondata.format';

       

      2) Load data in this table:

      LOAD DATA INPATH 'hdfs://localhost:54310/Data/complex/structofarray.csv' INTO table STRUCT_OF_ARRAY_com options ('DELIMITER'=',', 'QUOTECHAR'='"', 'FILEHEADER'='CUST_ID,YEAR,MONTH,AGE,GENDER,EDUCATED,IS_MARRIED,STRUCT_OF_ARRAY,CARD_COUNT,DEBIT_COUNT,CREDIT_COUNT,DEPOSIT,HQ_DEPOSIT','COMPLEX_DELIMITER_LEVEL_1'='$','COMPLEX_DELIMITER_LEVEL_2'='&');

      3) Execute Query:

      select * from struct_of_array_com;

      4) Expected Result: It should display all data from the table:

      5) Actual Result:

      Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 5, localhost, executor driver): java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.catalyst.InternalRow
      at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:51)
      at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:194)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
      at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
      at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      at org.apache.spark.scheduler.Task.run(Task.scala:108)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

       

      thrift log:

      18/04/24 18:28:14 INFO SparkExecuteStatementOperation: Running query 'select * from struct_of_array_com' with 82515cd4-77c1-411e-8e6a-6354c75d02bf
      18/04/24 18:28:14 INFO CarbonSparkSqlParser: Parsing command: select * from struct_of_array_com
      18/04/24 18:28:14 INFO HiveMetaStore: 12: get_table : db=bug tbl=struct_of_array_com
      18/04/24 18:28:14 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_table : db=bug tbl=struct_of_array_com
      18/04/24 18:28:14 INFO HiveMetaStore: 12: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
      18/04/24 18:28:14 INFO ObjectStore: ObjectStore, initialize called
      18/04/24 18:28:14 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
      18/04/24 18:28:14 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
      18/04/24 18:28:14 INFO ObjectStore: Initialized ObjectStore
      18/04/24 18:28:14 INFO CatalystSqlParser: Parsing command: array<string>
      18/04/24 18:28:14 INFO CarbonLRUCache: pool-23-thread-11 Removed entry from InMemory lru cache :: hdfs://localhost:54310/opt/CarbonStore/bug/struct_of_array_com/Fact/Part0/Segment_0/0_batchno0-0-1524572568106.carbonindex
      18/04/24 18:28:14 INFO CarbonLRUCache: pool-23-thread-11 Removed entry from InMemory lru cache :: hdfs://localhost:54310/opt/CarbonStore/bug/struct_of_array_com/Fact/Part0/Segment_1/0_batchno0-0-1524573891417.carbonindex
      18/04/24 18:28:14 INFO HiveMetaStore: 12: get_table : db=bug tbl=struct_of_array_com
      18/04/24 18:28:14 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_table : db=bug tbl=struct_of_array_com
      18/04/24 18:28:14 INFO CatalystSqlParser: Parsing command: array<string>
      18/04/24 18:28:14 INFO HiveMetaStore: 12: get_database: bug
      18/04/24 18:28:14 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: bug
      18/04/24 18:28:14 INFO HiveMetaStore: 12: get_database: bug
      18/04/24 18:28:14 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: bug
      18/04/24 18:28:14 INFO HiveMetaStore: 12: get_tables: db=bug pat=*
      18/04/24 18:28:14 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_tables: db=bug pat=*
      18/04/24 18:28:14 INFO TableInfo: pool-23-thread-11 Table block size not specified for bug_struct_of_array_com. Therefore considering the default value 1024 MB
      18/04/24 18:28:14 INFO CarbonLateDecodeRule: pool-23-thread-11 skip CarbonOptimizer
      18/04/24 18:28:14 INFO CarbonLateDecodeRule: pool-23-thread-11 Skip CarbonOptimizer
      18/04/24 18:28:14 INFO CodeGenerator: Code generated in 48.728783 ms
      18/04/24 18:28:14 INFO TableInfo: pool-23-thread-11 Table block size not specified for bug_struct_of_array_com. Therefore considering the default value 1024 MB
      18/04/24 18:28:14 INFO BlockletDataMap: pool-23-thread-11 Time taken to load blocklet datamap from file : hdfs://localhost:54310/opt/CarbonStore/bug/struct_of_array_com/Fact/Part0/Segment_0/0_batchno0-0-1524572568106.carbonindexis 1
      18/04/24 18:28:14 INFO BlockletDataMap: pool-23-thread-11 Time taken to load blocklet datamap from file : hdfs://localhost:54310/opt/CarbonStore/bug/struct_of_array_com/Fact/Part0/Segment_1/0_batchno0-0-1524573891417.carbonindexis 1
      18/04/24 18:28:14 INFO CarbonScanRDD:
      Identified no.of.blocks: 2,
      no.of.tasks: 2,
      no.of.nodes: 0,
      parallelism: 4

      18/04/24 18:28:14 INFO SparkContext: Starting job: run at AccessController.java:0
      18/04/24 18:28:14 INFO DAGScheduler: Got job 5 (run at AccessController.java:0) with 2 output partitions
      18/04/24 18:28:14 INFO DAGScheduler: Final stage: ResultStage 5 (run at AccessController.java:0)
      18/04/24 18:28:14 INFO DAGScheduler: Parents of final stage: List()
      18/04/24 18:28:14 INFO DAGScheduler: Missing parents: List()
      18/04/24 18:28:14 INFO DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[17] at run at AccessController.java:0), which has no missing parents
      18/04/24 18:28:14 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 39.9 KB, free 366.1 MB)
      18/04/24 18:28:14 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 28.5 KB, free 366.0 MB)
      18/04/24 18:28:14 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.2.102:40681 (size: 28.5 KB, free: 366.2 MB)
      18/04/24 18:28:14 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1006
      18/04/24 18:28:14 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (MapPartitionsRDD[17] at run at AccessController.java:0) (first 15 tasks are for partitions Vector(0, 1))
      18/04/24 18:28:14 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
      18/04/24 18:28:14 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 7, localhost, executor driver, partition 0, ANY, 6822 bytes)
      18/04/24 18:28:14 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 8, localhost, executor driver, partition 1, ANY, 6821 bytes)
      18/04/24 18:28:14 INFO Executor: Running task 0.0 in stage 5.0 (TID 7)
      18/04/24 18:28:14 INFO Executor: Running task 1.0 in stage 5.0 (TID 8)
      18/04/24 18:28:14 INFO TableInfo: Executor task launch worker for task 7 Table block size not specified for bug_struct_of_array_com. Therefore considering the default value 1024 MB
      18/04/24 18:28:14 INFO AbstractQueryExecutor: [Executor task launch worker for task 7][partitionID:com;queryID:3128889732141] Query will be executed on table: struct_of_array_com
      18/04/24 18:28:14 INFO TableInfo: Executor task launch worker for task 8 Table block size not specified for bug_struct_of_array_com. Therefore considering the default value 1024 MB
      18/04/24 18:28:14 INFO AbstractQueryExecutor: [Executor task launch worker for task 8][partitionID:com;queryID:3128889732141] Query will be executed on table: struct_of_array_com
      18/04/24 18:28:14 INFO ResultCollectorFactory: [Executor task launch worker for task 7][partitionID:com;queryID:3128889732141] Restructure based dictionary collector is used to scan and collect the data
      18/04/24 18:28:14 INFO ResultCollectorFactory: [Executor task launch worker for task 8][partitionID:com;queryID:3128889732141] Row based dictionary collector is used to scan and collect the data
      18/04/24 18:28:14 INFO UnsafeMemoryManager: [Executor task launch worker for task 7][partitionID:com;queryID:3128889732141] Total memory used after task 3129200578920 is 11638 Current tasks running now are : [3129107801490, 2657313584388, 2692308631405, 2620118305297, 2722177005617, 2671201017807]
      18/04/24 18:28:14 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 7)
      java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.catalyst.InternalRow
      at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:51)
      at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:194)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
      at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
      at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      at org.apache.spark.scheduler.Task.run(Task.scala:108)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      18/04/24 18:28:14 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 7, localhost, executor driver): java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.catalyst.InternalRow
      at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:51)
      at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:194)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
      at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
      at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)

      Attachments

        1. structofarray.csv
          193 kB
          Vandana Yadav

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Vandana7 Vandana Yadav
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: