Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27900

hive can not read iceberg-parquet table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 4.0.0-beta-1
    • Not Applicable
    • Iceberg integration
    • None

    Description

      We found that using HIVE4-BETA version, we could not query the Iceberg-Parquet table with vectorised execution turned on.

      --spark-sql(3.4.1+iceberg 1.4.2)
      CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy (
      a string,b string,c string)
      USING iceberg
      LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy'
      TBLPROPERTIES (
        'current-snapshot-id' = '5138351937447353683',
        'format' = 'iceberg/parquet',
        'format-version' = '2',
        'read.orc.vectorization.enabled' = 'true',
        'write.format.default' = 'parquet',
        'write.metadata.delete-after-commit.enabled' = 'true',
        'write.metadata.previous-versions-max' = '3',
        'write.parquet.compression-codec' = 'snappy');
      
      
      
      --hive-sql
      CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
      STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
      LOCATION 'hdfs://xxxxxxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
      TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
      
      
      set hive.default.fileformat=orc;
      set hive.default.fileformat.managed=orc;
      create table test_parquet_as_orc as select * from b_qqd_shop_rfm_parquet_snappy limit 100;
      
      
      
      
      
      
      , TaskAttempt 2 failed, info=[Error: Node: xxxx/xxx.xxxx.xx.xx: Error while running task ( failure ) : attempt_1696729618575_69586_1_00_000000_2:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
      at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
      at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
      at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
      at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
      at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
      at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
      at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
      at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:750)
      Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
      ... 16 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
      at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
      at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
      ... 19 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
      at org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137)
      at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
      at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
      at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
      at org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator.process(VectorLimitOperator.java:108)
      at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
      at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
      at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
      at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
      ... 20 more
      Caused by: java.lang.NullPointerException
      at org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream.write(NonSyncByteArrayOutputStream.java:110)
      at org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite.writeString(LazyBinarySerializeWrite.java:280)
      at org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow$VectorSerializeStringWriter.serialize(VectorSerializeRow.java:532)
      at org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:316)
      at org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:297)
      at org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:113)
      ... 28 more 

       

      1. TEZ_VERSION 0.10.3 SNAPSHOT
      2. iceberg table is cow table. insert small data will get same error.
      3. using orc-iceberg is ok.
      4. disable vectorized and using iceberg-parquet is ok.

      Attachments

        Activity

          People

            Unassigned Unassigned
            lisoda yongzhi.shao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: