Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3237

ALTER TABLE column type change fails select query

    XMLWordPrintableJSON

Details

    • 0.25

    Description

      create table if not exists cow_nonpt_nonpcf_tbl (
        id int,
        name string,
        price double
      ) using hudi
      options (
        type = 'cow',
        primaryKey = 'id'
      );
      
      insert into cow_nonpt_nonpcf_tbl select 1, 'a1', 20;
      
      DESC cow_nonpt_nonpcf_tbl;
      
      -- shows id int
      
      ALTER TABLE cow_nonpt_nonpcf_tbl change column id id bigint;
      
      DESC cow_nonpt_nonpcf_tbl;
      
      -- shows id bigint
      -- this works fine so far
      
      select * from cow_nonpt_nonpcf_tbl;
      
      -- throws exception
      
      
      org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file file:///opt/spark-warehouse/cow_nonpt_nonpcf_tbl/ff3c68e6-84d4-4a8a-8bc8-cc58736847aa-0_0-7-7_20220112182401452.parquet. Column: [id], Expected: bigint, Found: INT32
              at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179)
              at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
              at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503)
              at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
              at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
              at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
              at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
              at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
              at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
              at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
              at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
              at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
              at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
              at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
              at org.apache.spark.scheduler.Task.run(Task.scala:131)
              at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
              at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
              at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at java.lang.Thread.run(Unknown Source)
      Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException
              at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.constructConvertNotSupportedException(VectorizedColumnReader.java:339)
              at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readIntBatch(VectorizedColumnReader.java:571)
              at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:294)
              at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:283)
              at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:181)
              at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:37)
              at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
              at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:173)
              ... 20 more
      
      

      reported while testing on 0.10.1-rc1 (spark 3.0.3, 3.1.2)

      Attachments

        Issue Links

          Activity

            People

              biyan900116@gmail.com Yann Byron
              xushiyan Raymond Xu
              Raymond Xu, sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 0.5h
                  0.5h
                  Remaining:
                  Time Spent - 0.25h Remaining Estimate - 0.25h
                  0.25h
                  Logged:
                  Time Spent - 0.25h Remaining Estimate - 0.25h
                  0.25h