Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.4.0, 2.3.2
    • Fix Version/s: 3.0.0
    • Component/s: None
    • Labels:
      None

      Description

      For schema evolution, it includes the following points:
      1. column changes
      column reorder
      column add, column delete
      column rename
      2. type conversion
      low precision to high precision
      type to String
      For 1st type, current the code is not supporting the column addition operation. Detailed error is as follows:

      0: jdbc:hive2://localhost:10000/default> desc test_p;
      +-----------+------------+----------+
      | col_name  | data_type  | comment  |
      +-----------+------------+----------+
      | t1        | tinyint    |          |
      | t2        | tinyint    |          |
      | i1        | int        |          |
      | i2        | int        |          |
      +-----------+------------+----------+
      0: jdbc:hive2://localhost:10000/default> set hive.fetch.task.conversion=none;
      0: jdbc:hive2://localhost:10000/default> set hive.vectorized.execution.enabled=true;
      0: jdbc:hive2://localhost:10000/default> alter table test_p add columns (ts timestamp);
      0: jdbc:hive2://localhost:10000/default> select * from test_p;
      Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
      

      Following exception is seen in the logs

      Caused by: java.lang.IllegalArgumentException: [ts] BINARY is not in the store: [[i1] INT32, [i2] INT32, [t1] INT32, [t2] INT32] 3
              at org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:160) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:479) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:432) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:393) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:345) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:88) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) ~[hadoop-mapreduce-client-common-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_121]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_121]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
              at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_121]
      

      For 2nd type operation, non Vectorized Parquet reader leverages existing Parquet String inspector to do the conversion while vectorized path does not.
      To support, this JIRA is providing an abstract layer to read the underlying data and convert it to what Hive required for further computing.

        Attachments

        1. test_result_based_on_HIVE-18553.xlsx
          9 kB
          Ferdinand Xu
        2. HIVE-18553.patch
          6 kB
          Ferdinand Xu
        3. HIVE-18553.91.patch
          179 kB
          Vihang Karajgaonkar
        4. HIVE-18553.9.patch
          179 kB
          Ferdinand Xu
        5. HIVE-18553.8.patch
          175 kB
          Ferdinand Xu
        6. HIVE-18553.7.patch
          146 kB
          Ferdinand Xu
        7. HIVE-18553.6.patch
          120 kB
          Ferdinand Xu
        8. HIVE-18553.5.patch
          71 kB
          Ferdinand Xu
        9. HIVE-18553.4.patch
          40 kB
          Ferdinand Xu
        10. HIVE-18553.3.patch
          37 kB
          Ferdinand Xu
        11. HIVE-18553.2.patch
          7 kB
          Ferdinand Xu
        12. HIVE-18553.11.patch
          179 kB
          Ferdinand Xu
        13. HIVE-18553.10.patch
          179 kB
          Ferdinand Xu

          Issue Links

            Activity

              People

              • Assignee:
                Ferd Ferdinand Xu
                Reporter:
                vihangk1 Vihang Karajgaonkar
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: