Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14826 Support vectorization for Parquet
  3. HIVE-18553

Support schema evolution in Parquet Vectorization reader

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3.2, 2.4.0, 3.0.0
    • 3.0.0
    • None
    • None

    Description

      For schema evolution, it includes the following points:
      1. column changes
      column reorder
      column add, column delete
      column rename
      2. type conversion
      low precision to high precision
      type to String
      For 1st type, current the code is not supporting the column addition operation. Detailed error is as follows:

      0: jdbc:hive2://localhost:10000/default> desc test_p;
      +-----------+------------+----------+
      | col_name  | data_type  | comment  |
      +-----------+------------+----------+
      | t1        | tinyint    |          |
      | t2        | tinyint    |          |
      | i1        | int        |          |
      | i2        | int        |          |
      +-----------+------------+----------+
      0: jdbc:hive2://localhost:10000/default> set hive.fetch.task.conversion=none;
      0: jdbc:hive2://localhost:10000/default> set hive.vectorized.execution.enabled=true;
      0: jdbc:hive2://localhost:10000/default> alter table test_p add columns (ts timestamp);
      0: jdbc:hive2://localhost:10000/default> select * from test_p;
      Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
      

      Following exception is seen in the logs

      Caused by: java.lang.IllegalArgumentException: [ts] BINARY is not in the store: [[i1] INT32, [i2] INT32, [t1] INT32, [t2] INT32] 3
              at org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:160) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:479) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:432) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:393) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:345) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:88) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) ~[hadoop-mapreduce-client-common-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_121]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_121]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
              at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_121]
      

      For 2nd type operation, non Vectorized Parquet reader leverages existing Parquet String inspector to do the conversion while vectorized path does not.
      To support, this JIRA is providing an abstract layer to read the underlying data and convert it to what Hive required for further computing.

      Attachments

        1. HIVE-18553.patch
          6 kB
          Ferdinand Xu
        2. HIVE-18553.2.patch
          7 kB
          Ferdinand Xu
        3. HIVE-18553.3.patch
          37 kB
          Ferdinand Xu
        4. HIVE-18553.4.patch
          40 kB
          Ferdinand Xu
        5. test_result_based_on_HIVE-18553.xlsx
          9 kB
          Ferdinand Xu
        6. HIVE-18553.5.patch
          71 kB
          Ferdinand Xu
        7. HIVE-18553.6.patch
          120 kB
          Ferdinand Xu
        8. HIVE-18553.7.patch
          146 kB
          Ferdinand Xu
        9. HIVE-18553.8.patch
          175 kB
          Ferdinand Xu
        10. HIVE-18553.9.patch
          179 kB
          Ferdinand Xu
        11. HIVE-18553.10.patch
          179 kB
          Ferdinand Xu
        12. HIVE-18553.11.patch
          179 kB
          Ferdinand Xu
        13. HIVE-18553.91.patch
          179 kB
          Vihang Karajgaonkar

        Issue Links

          Activity

            People

              Ferd Ferdinand Xu
              vihangk1 Vihang Karajgaonkar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: