[IMPALA-10491] Impala parquet scanner should use writer.time.zone when converting Hive timestamps - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: Impala 3.4.0
Fix Version/s: None
Component/s: Backend
Labels:
- parquet

Target Version:

Product Backlog
Epic Color:
ghx-label-9

Description

~~IMPALA-8721~~ reports some issues with Hive 3 and timezone conversion.

~~HIVE-21290~~ fixed some of the issues, and also sets writer.time.zone in the Parquet metadata, which provides a better way to determine how the time zone was written. E.g.

tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/asdfgh/000000_0
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers
21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
file:        hdfs://localhost:20500/test-warehouse/asdfgh/000000_0
creator:     parquet-mr version 1.10.99.7.2.7.0-44 (build 27344fd5fdaa371e364c604f471b340f8bcf8936)
extra:       writer.date.proleptic = false
extra:       writer.time.zone = America/Los_Angeles
extra:       writer.model.name = 3.1.3000.7.2.7.0-44

We should use this timezone when converting timestamps, I think either always or when convert_legacy_hive_parquet_utc_timestamps=true.

CC boroknagyz csringhofer

Attachments

Issue Links

relates to

HIVE-21290 Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

Closed

IMPALA-8721 Wrong result when Impala reads a Hive written parquet TimeStamp column

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Feb/21 04:35

Updated:: 09/Feb/21 04:36