Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Consider an HBase table (e.g., HiveAvroTable) that has column with Avro data and there are timestamps nested under complex/struct types.
CREATE EXTERNAL TABLE hbase_avro_table( `key` string COMMENT '', `data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format'='1', 'hbase.columns.mapping' = ':key,data:frV4', 'data.frV4.serialization.type'='avro', 'data.frV4.avro.schema.url'='path/to/avro/schema/for/column/filename.avsc' ) TBLPROPERTIES ( 'hbase.table.name' = 'HiveAvroTable', 'hbase.struct.autogenerate'='true');
Any attempt to read the timestamp value from the nested struct leads to a ClassCastException.
select data_frV4.dischargedate.value from hbase_avro_table;
Below you can find the stack trace for the previous query:
2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.Timestamp cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyPrimitive at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) ... 11 more
The problem starts in toLazyObject method of AvroLazyObjectInspector.java, when this condition returns false for Timestamp, preventing the conversion of Timestamp to LazyTimestamp here.
The solution is to return true for Timestamps in the isPrimitive method.
Attachments
Issue Links
- relates to
-
HIVE-26379 hive.avro.timestamp.skip.conversion and other time zone conversion properties doesn't work for Hbase
- Open
-
HIVE-6147 Support avro data stored in HBase columns
- Closed
- links to