Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26373

ClassCastException when reading timestamps from HBase table with Avro data

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Consider an HBase table (e.g., HiveAvroTable) that has column with Avro data and there are timestamps nested under complex/struct types.

      CREATE EXTERNAL TABLE hbase_avro_table(
      `key` string COMMENT '',
      `data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
      ROW FORMAT SERDE
        'org.apache.hadoop.hive.hbase.HBaseSerDe'
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES (
      'serialization.format'='1',
      'hbase.columns.mapping' = ':key,data:frV4',
      'data.frV4.serialization.type'='avro',
      'data.frV4.avro.schema.url'='path/to/avro/schema/for/column/filename.avsc'
      )
      TBLPROPERTIES (
      'hbase.table.name' = 'HiveAvroTable',
      'hbase.struct.autogenerate'='true');
      

      Any attempt to read the timestamp value from the nested struct leads to a ClassCastException.

      select data_frV4.dischargedate.value from hbase_avro_table;
      

      Below you can find the stack trace for the previous query:

      2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
              at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
              at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
              at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
              at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
              at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.Timestamp cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
              at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
              at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
              at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
              at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
              at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
              at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
              at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
              at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059)
              at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
              at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
              at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
              at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
              at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
              at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
              ... 11 more 
      

      The problem starts in toLazyObject method of AvroLazyObjectInspector.java, when this condition returns false for Timestamp, preventing the conversion of Timestamp to LazyTimestamp here.

      The solution is to return true for Timestamps in the isPrimitive method.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            soumyakanti.das Soumyakanti Das Assign to me
            soumyakanti.das Soumyakanti Das
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h 10m
              1h 10m

              Slack

                Issue deployment