[HIVE-26612] INT64 Parquet timestamps cannot be read into BIGINT Hive type - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-2
Component/s: Database/Schema
Labels:
- pull-request-available

Description

If a parquet file has a Type of "int64 eventtime (TIMESTAMP(MILLIS,true))", the following error is produced:

java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/xxxx/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet
	at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:213)
	at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:98)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/xxxx/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:624)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:531)
	at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:197)
	... 55 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/stamatis/Projects/Apache/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet
	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:255)
	at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:87)
	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:89)
	at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:771)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:562)
	... 57 more
Caused by: java.lang.UnsupportedOperationException: org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$10$1
	at org.apache.parquet.io.api.PrimitiveConverter.addLong(PrimitiveConverter.java:105)
	at org.apache.parquet.column.impl.ColumnReaderBase$2$4.writeValue(ColumnReaderBase.java:301)
	at org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:410)
	at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30)
	at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:230)
	... 63 more

The parquet file can be created with the following steps (through spark):

spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY")
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

[1]
val df = Seq(
(1, Timestamp.valueOf("2014-01-01 23:00:01")),
(1, Timestamp.valueOf("2014-11-30 12:40:32")),
(2, Timestamp.valueOf("2016-12-29 09:54:00")),
(2, Timestamp.valueOf("2016-05-09 10:12:43"))
).toDF("typeid","eventtime")

[2]
[root@c4839-node3 test_parquet2]# parquet-tools schema part-00001-6c90b794-90b9-4cc0-afc5-2e49a4e96bad-c000.snappy.parquet
message spark_schema

{ required int32 typeid; optional int64 eventtime (TIMESTAMP(MILLIS,true)); }

[3]
[root@c4839-node3 test_parquet1]# parquet-tools schema part-00001-cb1aeebb-ec87-4273-82ec-911c4fb605b6-c000.snappy.parquet
message spark_schema

{ required int32 typeid; optional int96 eventtime; }

Attachments

Issue Links

is caused by

HIVE-21215 Read Parquet INT64 timestamp

Closed

is required by

HIVE-23345 INT64 Parquet timestamps cannot be read into bigint Hive type

Closed

Testing discovered

HIVE-26658 INT64 Parquet timestamps cannot be mapped to most Hive numeric types

Closed

links to

GitHub Pull Request #3651

Activity

People

Assignee:: Steve Carlin

Reporter:: Steve Carlin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Oct/22 23:03

Updated:: 16/Nov/22 13:50

Resolved:: 21/Oct/22 09:56

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 10m