Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.3.2, 2.4.0, 3.0.0
-
None
-
None
Description
For schema evolution, it includes the following points:
1. column changes
column reorder
column add, column delete
column rename
2. type conversion
low precision to high precision
type to String
For 1st type, current the code is not supporting the column addition operation. Detailed error is as follows:
0: jdbc:hive2://localhost:10000/default> desc test_p; +-----------+------------+----------+ | col_name | data_type | comment | +-----------+------------+----------+ | t1 | tinyint | | | t2 | tinyint | | | i1 | int | | | i2 | int | | +-----------+------------+----------+ 0: jdbc:hive2://localhost:10000/default> set hive.fetch.task.conversion=none; 0: jdbc:hive2://localhost:10000/default> set hive.vectorized.execution.enabled=true; 0: jdbc:hive2://localhost:10000/default> alter table test_p add columns (ts timestamp); 0: jdbc:hive2://localhost:10000/default> select * from test_p; Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
Following exception is seen in the logs
Caused by: java.lang.IllegalArgumentException: [ts] BINARY is not in the store: [[i1] INT32, [i2] INT32, [t1] INT32, [t2] INT32] 3 at org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:160) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:479) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:432) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:393) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:345) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:88) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?] at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?] at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) ~[hadoop-mapreduce-client-common-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121] at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_121]
For 2nd type operation, non Vectorized Parquet reader leverages existing Parquet String inspector to do the conversion while vectorized path does not.
To support, this JIRA is providing an abstract layer to read the underlying data and convert it to what Hive required for further computing.