Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14826 Support vectorization for Parquet
  3. HIVE-18553

Support schema evolution in Parquet Vectorization reader

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3.2, 2.4.0, 3.0.0
    • 3.0.0
    • None
    • None

    Description

      For schema evolution, it includes the following points:
      1. column changes
      column reorder
      column add, column delete
      column rename
      2. type conversion
      low precision to high precision
      type to String
      For 1st type, current the code is not supporting the column addition operation. Detailed error is as follows:

      0: jdbc:hive2://localhost:10000/default> desc test_p;
      +-----------+------------+----------+
      | col_name  | data_type  | comment  |
      +-----------+------------+----------+
      | t1        | tinyint    |          |
      | t2        | tinyint    |          |
      | i1        | int        |          |
      | i2        | int        |          |
      +-----------+------------+----------+
      0: jdbc:hive2://localhost:10000/default> set hive.fetch.task.conversion=none;
      0: jdbc:hive2://localhost:10000/default> set hive.vectorized.execution.enabled=true;
      0: jdbc:hive2://localhost:10000/default> alter table test_p add columns (ts timestamp);
      0: jdbc:hive2://localhost:10000/default> select * from test_p;
      Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
      

      Following exception is seen in the logs

      Caused by: java.lang.IllegalArgumentException: [ts] BINARY is not in the store: [[i1] INT32, [i2] INT32, [t1] INT32, [t2] INT32] 3
              at org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:160) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:479) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:432) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:393) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:345) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:88) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) ~[hadoop-mapreduce-client-common-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_121]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_121]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
              at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_121]
      

      For 2nd type operation, non Vectorized Parquet reader leverages existing Parquet String inspector to do the conversion while vectorized path does not.
      To support, this JIRA is providing an abstract layer to read the underlying data and convert it to what Hive required for further computing.

      Attachments

        1. test_result_based_on_HIVE-18553.xlsx
          9 kB
          Ferdinand Xu
        2. HIVE-18553.patch
          6 kB
          Ferdinand Xu
        3. HIVE-18553.91.patch
          179 kB
          Vihang Karajgaonkar
        4. HIVE-18553.9.patch
          179 kB
          Ferdinand Xu
        5. HIVE-18553.8.patch
          175 kB
          Ferdinand Xu
        6. HIVE-18553.7.patch
          146 kB
          Ferdinand Xu
        7. HIVE-18553.6.patch
          120 kB
          Ferdinand Xu
        8. HIVE-18553.5.patch
          71 kB
          Ferdinand Xu
        9. HIVE-18553.4.patch
          40 kB
          Ferdinand Xu
        10. HIVE-18553.3.patch
          37 kB
          Ferdinand Xu
        11. HIVE-18553.2.patch
          7 kB
          Ferdinand Xu
        12. HIVE-18553.11.patch
          179 kB
          Ferdinand Xu
        13. HIVE-18553.10.patch
          179 kB
          Ferdinand Xu

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Ferd Ferdinand Xu Assign to me
            vihangk1 Vihang Karajgaonkar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment