Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24504

VectorFileSinkArrowOperator does not serialize complex types correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0
    • Component/s: llap

      Description

      When the table has complex types and the result has 0 records the VectorFileSinkArrowOperator only serializes the primitive types correctly. For complex types only the main type is set which causes issues for clients trying to read data.

      Got the following HWC exception:

      Previous exception in task: Unsupported data type: Null
      	org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71)
      	org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106)
      	org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98)
      	org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala)
      	org.apache.spark.sql.vectorized.ArrowColumnVector.<init>(ArrowColumnVector.java:135)
      	com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105)
      	com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29)
      	org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
      	org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
      	org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown Source)
      	org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
      	org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      	org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
      	org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
      	org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
      	org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
      	org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
      	org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
      	org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
      	org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
      	org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
      	org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
      	org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
      	org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      	org.apache.spark.scheduler.Task.run(Task.scala:109)
      	org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
      	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	java.lang.Thread.run(Thread.java:745)
      	at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117)
      	at org.apache.spark.scheduler.Task.run(Task.scala:119)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745) 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pvary Peter Vary
                Reporter:
                pvary Peter Vary
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h