Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.1.0
Description
HIVE-12192, HIVE-20007 changed the way that timestamp computations are performed and to some extend how timestamps are serialized and deserialized in files (Parquet, Avro).
In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet files is not backwards compatible. In other words writing timestamps with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them with another (not including the previous issues) may lead to different results depending on the default timezone of the system.
Consider the following scenario where the default system timezone is set to US/Pacific.
At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); SELECT * FROM employee;
1 | 1880-01-01 00:00:00 |
2 | 1884-01-01 00:00:00 |
3 | 1990-01-01 00:00:00 |
At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; SELECT * FROM employee;
1 | 1879-12-31 23:52:58 |
2 | 1884-01-01 00:00:00 |
3 | 1990-01-01 00:00:00 |
The timestamp for eid=1 in branch-2.3 is different from the one in master.
Attachments
Issue Links
- causes
-
HIVE-26270 Wrong timestamps when reading Hive 3.1.x Parquet files with vectorized reader
- Closed
- is caused by
-
HIVE-12192 Hive should carry out timestamp computations in UTC
- Closed
-
HIVE-20007 Hive should carry out timestamp computations in UTC
- Closed
- Is contained by
-
HIVE-26751 Bug Fixes and Improvements for 3.2.0 release
- Open
- is related to
-
HIVE-25219 Backward incompatible timestamp serialization in Avro for certain timezones
- Closed
- relates to
-
HIVE-24074 Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x
- Closed
- links to