Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 3.4.0
-
None
-
ghx-label-9
Description
IMPALA-8721 reports some issues with Hive 3 and timezone conversion.
HIVE-21290 fixed some of the issues, and also sets writer.time.zone in the Parquet metadata, which provides a better way to determine how the time zone was written. E.g.
tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/asdfgh/000000_0 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 file: hdfs://localhost:20500/test-warehouse/asdfgh/000000_0 creator: parquet-mr version 1.10.99.7.2.7.0-44 (build 27344fd5fdaa371e364c604f471b340f8bcf8936) extra: writer.date.proleptic = false extra: writer.time.zone = America/Los_Angeles extra: writer.model.name = 3.1.3000.7.2.7.0-44
We should use this timezone when converting timestamps, I think either always or when convert_legacy_hive_parquet_utc_timestamps=true.
Attachments
Issue Links
- relates to
-
HIVE-21290 Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
- Closed
-
IMPALA-8721 Wrong result when Impala reads a Hive written parquet TimeStamp column
- Resolved