Description
Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x incorrectly. As an example session to demonstrate this problem, create a dataset using Hive version 2.x in America/Los_Angeles:
hive> create table ts_‹format› (ts timestamp) stored as ‹format›; hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
Querying this table by issuing
hive> select * from ts_‹format›;
from different time zones using different versions of Hive and different storage formats gives the following results:
‹format› | Writer time zone (in Hive 2.x) | Reader time zone | Result in Hive 2.x reader | Result in Hive 3.1 reader |
Avro and Parquet | America/Los_Angeles | America/Los_Angeles | 2018-01-01 00:00:00.0 | 2018-01-01 08:00:00.0 |
Avro and Parquet | America/Los_Angeles | Europe/Paris | 2018-01-01 09:00:00.0 | 2018-01-01 08:00:00.0 |
Textfile and ORC | America/Los_Angeles | America/Los_Angeles | 2018-01-01 00:00:00.0 | 2018-01-01 00:00:00.0 |
Textfile and ORC | America/Los_Angeles | Europe/Paris | 2018-01-01 00:00:00.0 | 2018-01-01 00:00:00.0 |
Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored in Avro and Parquet formats. Apache ORC behaviour has not changed because it was modified to adjust timestamps to retain backwards compatibility. Textfile behaviour has not changed, because its processing involves parsing and formatting instead of proper serializing and deserializing, so they inherently had LocalDateTime semantics even in Hive 2.x.
Attachments
Issue Links
- is a child of
-
HIVE-21348 Execute the TIMESTAMP types roadmap
- Open
- is related to
-
HIVE-22167 TIMESTAMP - Backwards incompatible change: Hive 3.1 reads back binary RCFILE timestamps written by Hive 2.x incorrectly
- Open